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MAMMALIAN ENDONUCLEASE ID, AND DIAGNOSTIC AND THERAPEUTIC 

USES THEREOF 

The research leading to the present inventions was funded in pan by Grant No.s CA 16669, 
CA 49869. CA 16087 and GM 07308 from the National Institutes of Health. The 
govemmcm may have certain rights in the invention 
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TECHNIC A I FIFT n QF THF INVFNTinM 

The present invention relates generally to the measurement and repair of DNA damaee to 
cells resulting from oxidation or ultraviolet radiation, in mammals including animals and 
humans, and more panicularly to materials identified herein as modulators of DNA damage, 
and to the diagnostic and therapeutic uses to w^hich such modulators may be put, and the like. 



BACKGROUND OF THF TNVFNTinN 
When a pyrimidine residue in cellular DNA becomes modified by oxidation, reduction or 
hydration of its 5,6 double bond, repair is initiated by a DNA-glycosylase activity which 
cleaves the N-glycosyl bond of the damaged residue releasing the modified base and creating 
15 an abasic (AP) site in the DNA backbone. Such DNA glycosylase activities have been 

identified in bacteria, yeast and mammalian species [Brent. Biophys. J.. 13:399-401 (1973); 
Bacchctti, et at.. Biochim. Biophys. Acta. 390:285-297 (1975); Duker. et al.. Nature. 255:82- 
84 (1975); Ness, et al. Biochim. Biophys Acta. 520:11 1-121 (1978); Demple. et ai. Nature. 
287:203-208 (1980): Cunningham, etai. Proc. Natl. Acad. Set. U.S.A.. 82:474-478 (1985); 
20 Doetsch. etal.. Biochemistry. 25:2212-2220 (1986); Boorstein. etai. Biochemistry. 28:6164- 
6170 (1989)]. The DNA repair enzyme E. coli endonuclcase III was the first of such 
enzymes to be described. It was identified not on the basis of its DNA glycosylase activity, 
but rather by its nicking activity directed against UV-irradiated DNA [Radman, J. Biol. 
Chem.. 251:1438-1445 (1976)]. Subsequently, it was shown that nicking of UV-irradiated 
25 DNA resulted from 2 enzymatic activities; a DNA-glycosylase which released pyrimidine 

(cyiosine and/or uracil) hydrates from the DNA backbone, yielding an apyrimidinic (AP) site 
[Boorstein, et al.. 1989. supra], and an activity which effected strand cleavage via P- 
eiimination of the 3' phosphate group of the apyrimidinic sugar residue [Bailly. et al.. 
BiochemicalJ.. 242:565-572(1987): Kim. era/.. J. Biol. Chem.. 264:2739-2745 (1989); 
Mazumder. et al.. Biochemistry. 30: 1 1 1 9- 1 1 26 ( 1 99 1)]. The latter activity has been termed 
an AP lyase to distinguish it from AP endonucleases. such as exonuciease lU or 



BNSDOCIL) <WO_ 9731t)12A2 l.> 



wo 97/31612 PCT/US97/03242 

2 

endonuclease IV, which catalyze strand cleavage via hydrolysis of phosphodiester bonds 
[Bailly. e( a/.. Nucleic Acids Res,. 17:361 7-361 8 f 1989)]. Endonuclease III is one of a group 
of enzymes, including T4 endonuclease V and the £. coli Fpg protein (MutM), which 
demonstrate both DNA-glycosylase and AP lyase activities [Dcmpie, e( ai, Ann. Rev, 
5 Biochenr. 63:91 5-948 ( 1 994): Dodson. et al., J. Bioi Chem., 269:32709-327 1 2 ( 1 994)]. 

In addition to excising pyrimidine hydrates, the DNA-glycosylase activity of endonuclease 
HI also excises pyrimidine glycols, ring-contracted pyrimidine derivatives, such as 5- 
hydroxymeihyihydantoin. and urea residues composed of the N1-C2-N3 atoms of the 
pyrimidine skeleton [Stmisie, e( ai. Proc. Natl. Acad. Sci. US. A.. 72: 1997-2001 (1975); 

1 0 Demple, et qL, 1980, supra\ Breimcr, et al, J, Biol Chem., 259:5543-4458 (1984); 

Cunningham, et ai, 1985, supra]. Enzyme activities functionally analogous to endonuclease 
III have been identified in bacteria other than E. coli, in yeast, and in mammalian cells and 
tissues through the use of UV- irradiated, chemically oxidized, and y-irradiated DNA as 
substrates [Brent. 1973 supra, Bacchetti, etai, 1975. supra\ Duker, et ai. 1975. supra\ Ness, 

1 5 et ai. 1 978. supra, Doeisch, et ai, 1986, supra]. Extracts of Hela cells have been shown to 
contain a thymine glycol DNA-giycosylase [Higgins, ei ai. Biochemistry, 26: 1683-1688 
(1987)]. It has also been demonstrated that both endonuclease III and Hela ceil extracts 
released cytosine hydrate (as well as its deamination product, uracil hydrate) from UV- 
irradiated DNA [Boorstein, et ai. 1989. supra], Kim, et ai [1989, supra] described 2. or 

20 possibly 3. UV-endonuc lease activities in Hela cells by monitoring the nicking of UV- 

irradiated circular DNA. Huq ei ai [Eur. J Biochem. 206:833-839 (1992)] reported a 25 
fold purification of an endonuclease Ill-Iike activity from calf thymus and stated that the N- 
terminal sequence of this protein was not homologous to other known proteins. 

In view of the significance and activity of endonuclease III as recited in the literature, it 
25 would be desirable, and a need therefore exists, to elucidate the mammalian homologs, and 
other potentially active fragments, that may be applied to the development of both diagnostic 
and therapeutic modalities, to treat the adverse effects of exposure to radiation and oxidation. 
It is therefore toward the fulfillment of this need that the present invention is directed. ' 
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SUMMARY OF THF f]MVp>lT]r^M 

In accordance with the present mvent.on, a DNA damage repa.r modulator has been purified, 
the human homolog has been identified, and a fusion protein comprising the human homolog 
has been made and isolated. In particular, the modulator of DNA damage repair comprises 
mammalian Endonuclease HI. and specifically, the human homolog thereof, including the 
full length cDNA. and the polypeptide for which it codes, and related fusion proteins. 

The present mvent,on also relates to a recombinant DNA molecule or cloned gene, or a 
degenerate variant thereof, which encodes a DNA repair modulator; preferably a nucleic acid 
molecule, in panicuiar a recombinant DNA molecule or cloned gene, encoding the DNA 
repair modulator has a nucleotide sequence or is complementary to a DNA sequence shown 
in FIGURE 8. SEQ IDNO.l. 



The human and murine DNA sequences of the DNA repair modulator of the present 
invention or portions thereof, may be prepared as probes to screen for complementary 
sequences and genomic clones in the same or altemate species. The present invention 

1 5 extends to probes so prepared that may be provided for screening cDNA and genomic 

libraries for the DNA repair modulator. For example, the probes may be prepared with a 
variety of known vectors, such as the phage X vector. The present invention also includes the 
preparation of plasmids including such vectors, and the use of the DNA sequences to 
construct vectors expressing antisense RNA or ribozymes which would anack the mRNAs of 

20 any or all of the DNA sequences disclosed herein. Correspondingly, the preparation of 
antisense RNA and ribozymes are included herein. 

The present invention also includes DNA repair modulator proteins having the activities 
noted herein, and that display the amino acid sequences set forth and described above, as set 
forth in FIGURE 8. SEQ ID NO:2: including that depicted in FIGURE 7, SEQ ID NO:42, 
25 which comprises amino acids 8-304 of SEQ ID NO:2. 

The present invention also includes fusion proteins comprising a mammalian endonuclease 
ril of the present invention; and nucleic acids that encode such fusion proteins. In one 
embodiment, the fusion protein comprises a human endonuclease III. In a preferred 
embodiment, the human endonuclease III has the amino acid sequence of SEQ ID NO:42 
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(amino acids 8-304 of SEQ ID NO:2). In one embodiment the fusion protein is a 
mammalian endonuclease III fused to glutathione S-transferase. 

in a further embodiment of the invention, the full DNA sequence of the recombinant DNA 
molecule or cloned gene so determined or a corresponding fusion protein may be operativcly 
5 linked to an expression control sequence which may be introduced into an appropriate host. 
The invention accordingly extends to unicellular hosts transformed with the cloned gene or 
recombinant DNA molecule comprising a DNA sequence encoding the present DNA repair 
modulator(s), and more particularly, the complete DNA sequence determined from the 
sequences set forth above and in FIGURE 8, SEQ ID NO: I or a fusion protein thereof 

1 0 According to other preferred features of certain preferred embodiments of the present 

invention, a recombinant expression system is provided to produce biologically active animal 
or human DNA repair modulator(s). 

More particularly, the present invention provides a mammalian endonuclease HI polypeptide, 
nucleic acids encoding the same, methods for producing the polypeptide, methods for 
1 5 treating diseases or disorders associated with DNA damage, such as occurs in UV-irradiated 
tissues, chemically oxidized tissues, and gamma- irradiated tissues, and methods for 
diagnosing susceptibility to DNA damage by determining the level of activity of the 
endonuclease ill in tissues. 

In a further aspect the invention provides a mammalian endonuclease III purified greater than 
about 5000-fold, which endonuclease III demonstrates pyrimidine hydrate DNA-glycosylase 
activity, thymine glycol DNA-glycosylase activity, and lyase activity, and reductively cross 
links with a thymine glycol containing oligodeoxynucleotide. Further, as illustrated in the 
Examples, infra, the endonuclease in 100 mM NaCl may elute from a I ml single stranded- 
DNA-celiulose chromatography column eluted with a 12.5 ml gradient of 100 to 600 mM 
NaCI at 0.2 ml/min in about fractions 12-18, and more preferably in about fractions 15-17. 

As demonstrated in the Examples, mfra, the present endonuclease III can have an apparent 
molecular weight of 29 kDa as determined by gel filtration, and an apparent molecular 
weight of 3 1 kDa as determined by SDS-PAGE analysis. 
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in a further aspect the present endonuclease III has a partial amino acid sequence selected 
from the group consisting of SEQ ID NOs: 25,. 26. 27. 28 in FIGURE 4. line C (bovine), SEQ 
ID NO:6 in FIGURE 4. line D (human), and SEQ ID NO:20 in FIGURE 4. line E (rat). In a 
further embodiment, the endonuclease III has an ammo acid sequence selected from the 
5 group consisting of bovine endonuclease HI, human endonuclease III. and rat endonuclease 
III. In a particular embodiment, the endonuclease 111 ,s a human endonuclease III having an 
amino acid sequence corresponding to FIGURE 8. SEQ ID NO:2. 

The present invention extends to a purified nucleic acid encoding a mammalian endonuclease 
MI. which endonuclease HI demonstrates pyrimidine hydrate DNA-glycosylase activity. 
1 0 thymine glycol DNA-glycosylase activity, and lyase activity, and reductively cross links with 
a thymine glycol conuining oligodeoxynucleotide. In an illustrative embodiment, the 
endonuclease III in 100 mM NaCl elutes from a I ml single stranded-DNA-cellulose 
chromatography column eluted with a 12.5 ml gradient of 100 to 600 mM NaCI at 0.2 
ml/mm in about fractions 12-18. and preferably elutes in about fractions 15-1 7. In a ftirther 

1 5 Illustration, the endonuclease III has a Stokes radius corresponding to a protein having a 
molecular weight of 29 kDa as determined by gel filtration, and a molecular weight of 3 1 
kDa as determined by SDS-PAGE analysis. As illustrated, the purified nucleic acid of the 
presem invention may encode endonuclease III having a partial amino acid sequence selected 
from the group consisting of FIGURE 4. SEQ ID NOs: 25.26.27,28 in line C. SEQ ID NO:6 

20 in line D and SEQ ID NO:20 in line E; particularly, the purified nucleic acid encodes the 

endonuclease III having an amino acid sequence selected from the group consisting of bovine 
endonuclease III, human endonuclease III, and rat endonuclease HI. 

In specific embodiments, illustrated, the purified nucleic acid has a nucleotide sequence 
con«sponding or complementary to the nucleotide sequence selected from the group 

25 consisting of SEQ ID NOs: 25,26.27,28 in a bovine, SEQ ID NO:6 in a human and SEQ ID 
NO:20 in a rat. In another embodiment, the nucleic acid is hybridizable under stringent 
conditions to a nucleic acid having a nucleotide sequence corresponding or complementary 
to the nucleotide sequence selected from the group consisting of FIGURE 4, SEQ ID NOs: 
25.26.27.28 in line C. SEQ ID NO:6 in line D and SEQ ID NO:20 in line E. In a preferred 

30 embodiment the mammalian endonuclease III is a human endonuclease III. In a more 

purified embodiment the purified nucleic acid encodes the endonuclease HI having an amino 
acid sequence corresponding to SEQ ID NO:2. In the most preferred embodiment of this 
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type the puriHed nucleic acid has a nucleotide sequence as depicted in SEQ ID NO: 1, from 
nucleotide 9 to nucleotide 920. 



As can be readily appreciated by one of ordinary skill in the an, the mammalian 
endonuclease III can be an allelic variant, with minor nucleotide or amino acid sequence 
5 variations as compared to the specific mammalian endonuclease Ills exemplified herein. 
Such allelic variants include, but are not limited to, mutants with decreased or ablated, or 
increased, enzymatic activity. 

In a specific embodiment, the purified nucleic acid is DNA encoding the mammalian 
endonuclease III. Such a DNA molecule may be a recombinant DNA vector. Preferably, the 

1 0 DNA vector is an expression vector, wherein the DNA encoding the mammalian 

endonuclease III, preferably a human endonuclease III, is operatively associated with an 
expression control sequence. The present invention accordingly extends to a recombinant 
host cell comprising the DNA expression vector capable of expressing the mammalian 
endonuclease III, preferably human endonuclease III. The invention provides a 

1 5 corresponding method for producing a mammalian endonuclease III comprising expressing 
the expression vector in a recombinant host cell under conditions that provide for expression 
of the endonuclease III. Thus, the invention advantageously provides for expression of 
recombinant mammalian endonuclease III, which is important for direct therapy, 
identification of agonists and antagonists of endonuclease III, and production of anti- 

20 mammalian endonuclease III antibodies. 



The recombinant DNA of the invention allows for direct gene therapy of conditions 
associated with a mutation or decreased expression of mammalian endonuclease III in a 
mammal, preferably a human. According to the invention, such gene therapy may be 
effected by transient expression of the endonuclease III in affected, differentiated tissues, or 

25 it may involve long term or indefinite expression in by gene transfer into progenitor or 

undifferentiated cells. Thus, in a further embodiment, the invention provides a recombinant 
virus comprising the DNA vector. The recombinant virus may be selected from the group 
consisting of a retrovirus, herpes simplex virus (HSV), papillomavirus, Epstein Bart virus 
(EBV), adenovirus, and adeno-associated virus (AAV). In another embodiment, the 

30 invention provides a naked DNA vector. 



BNSDOCID: <WO 



973161 2A2 I ^ 



PCT/US97/0324? 



The invention provides a method for increasing the level of expression of a mammalian 
endonuclease III comprising introducing an expression vector into a host m vivo under 
conditions that provide for expression of the endonuclease III. The expression vector may be 
a viral expression vector, or it may be a naked DNA expression vector. Various methods are 
5 known in the art for transfecing cells with DNA ,n wvo, including techniques such as 
iipofection. targeted DNA transfer, and the like. 

According to one embodiment of the invention, the expression vector may be introduced into 
tissue exposed to radiation prior to exposure to the radiation. For example, a human who 
may be exposed to the sun or to gamma irradiation may undergo prophylactic gene therapy 
1 0 preferably with a transient expression vector, prior to exposure to the radiation source. In a 
specific embodiment, the gene therapy vector of the invention may be provided in a gel, 
cream, or lotion for application to the skin. Prefen.bly. such a gel. cream, or lotion contains a 
sunscreen. 



in another embodiment, the expression vector is introduced into tissue exposed to radiation 
1 5 after exposure to the radiation. 

As noted above, in a preferred aspect the foregoing nucleic acids, DNA molecules, and 
associated methods involve human endonuclease III, and the treatment or administration to a 
human in vivo. 

The invention further provides a method for treating a disease or disorder associated with 
20 DNA damage in a mammal, comprising increasing the level of mammalian endonuclease III 
m cells demonstrating DNA damage, wherein the endonuclease III demonstrates pyrimidine 
hydrate DNA-glycosylase activity, thymine glycol DNA-glycosylase activity, and lyase 
activity, and reductively cross links with a thymine glycol containing oligodeoxvnucieotide 
In one embodiment, the level of mammalian endonuclease III is increased bv administration 
25 of purified endonuclease III to the cells demonstrating DNA damage. In another 

embodiment, the level of mammalian endonuclease III is increased by administration of a 
recombinant expression vector to the ceils demonstrating DNA damage, which expression 
vector provides for expression of the mammalian endonuclease III in vivo. Preferably the 
expression vector is a viral expression vector or a naked DNA expression vector The 
30 expression vector may be introduced into tissue e.xposed to radiation prior to exposure to the 
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radiation, or it may be introduced into tissue exposed to radiation after exposure to the 
radiation. . 

The present invention further provides oligonucleotide probes, particularly labeled probes, 
and PCR primers to isolate nucleic acids, such as mRNA. cDNA, and genomic DNA, 
5 encoding a mammalian endonuclease 111, preferably a human endonuclease III, with the 

proviso that such probes do not correspond to the 3' EST (expressed sequence lags) from H. 
sapiens deposited with GenBank and assigned accession number F04657 or Rattus sp. 
deposited with GenBank and assigned accession number H33255. Such probes can be used 
to isolate a nucleic acid encoding a mammalian endonuclease III (preferably a human 

10 endonuclease HI), detect the level of expression of a mammalian endonuclease III in a tissue 
sample, or detect a mutation in a mammalian endonuclease III, e.g., by hybridization (or lack 
of hybridization) of a specific probe under highly stringent conditions, or by detection of a 
mutated sequence in a PCR-amplifled DNA by methods such as single stranded molecular 
weight polymorphisms, or the introduction or elimination of a restriction site. Thus, the 

1 5 invention is directed to an oligonucleotide of greater than 1 0 nucleotides which hybridizes 
under stringent conditions, wherein the T„ is greater than 60**C. Preferably, the 
oligonucleotide hybridizes at a T„ of greater than 65 °C, In a specific embodiment the 
oligonucleotide hybridizes at 40% formamide, with 5x or 6x SCC; in another specific 
embodiment, the oligonucleotide hybridizes at 50% formamide. In a specific embodiment, 

20 exemplified infra, the probe is an oligonucleotide having the nticleotide sequence 

GTGGCACG AGATCAATGGACTCTTG, SEQ ID NO:4. In another specific embodiment, 
exemplified infra, hybridization is detected using biotinylated sequence-specific 
oligonucleotides and magnetic streptavidin beads to enrich a library prior to screening. 

In addition, the present invention provides an antibody that specifically binds to the 
25 mammalian endonuclease III, preferably a human endonuclease HI. The antibody of the 
invention may be polyclonal or monoclonal. An antibody of the invention can be used to 
detect the presence or level of mammalian endonuclease III, i.e.. m sttu in a tissue biopsy or 
a tissue (using in vivo imaging techniques), or in vitro in a tissue homogenate. 

The present invention also provides biochemical techniques for detecting mammalian 
30 endonuclease III, e.g., by detecting enzymatic activity characteristic of endonuclease III. 

However, by providing for high purification of endogenous endonuclease III. the invention 
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allows for quantitative evaluat.on of the level of endonuclease 111 activity in a tissue. In one 
embodiment, the en2> matic activity can be evaluated by DNA-glycosylase activity (e.g., 
pyrimidine hydrate or thymme glycol DNA-glycosylase activity). Alternatively, enzvmltic 
activity can be measured by lyase activity. In another aspect, enzymatic activity can be 
5 measured by evaluating the level of reductive cross linking to a thymme glycol-containmg 
DNA oligodeoxynucleotide. In a particular aspect, the amount of cross- 1 inked polypeptide 
can be compared to uncross linked polypeptide to determine whether a mutation in the 
e.xpressed polypeptide affects enzymatic activity, and thus the ability to form cross links. 

Thus, a particular advantage of the present invention is the ability to detect or measure 
10 increased sensitivity to DNA damage comprising detecting a decrease in the level of activity 
of a mammalian endonuclease III in cells from a mammal, wherein the endonuclease III 
demonstrates pynmidine hydrate DNA-glycosylase activity, thymine glycol DNA- 
glycosylase activity, and lyase activity, and reductiveiy cross links with a thymine glycol 
containing oligodeo.xynucleotide. In other words, the invention provides for identifying 
1 5 individuals, especially human, at risk for radiation-induced DNA damage because of 

msufflcient endonuclease III DNA repair activity. It is a particular advantage that the present 
invention provides any one of three methods for detecting endonuclease III levels, which can 
be used independently or in a combination of one or more. 



20 



Furthermore, in addition to detecting the level of endonuclease III (whether mRNA 
e.xpression. protein expression, or enzymatic activity), the invention allows for determination 
of inactivating mutations. Thus, in one embodiment the decrease in the level of activity of 
mammalian endonuclease III is detected by detecting a decrease in the level of expression of 
the mammalian endonuclease III polypeptide. Such a decrease can be evaluated by 
immunological methods, by biochemical methods (e.g., purification and detection of band 
intensity by PAGE, or enzymatic activity), or by binding to DNA. In another aspect, the 
decrease can be evaluated with specific nucleic acid probes, such that a decrease in the level 
of activity of mammalian endonuclease III is detected by detecting a decrease in the level of 
expression of the mammalian endonuclease 111 mRNA. or a mutation in the DNA encoding 
the endonuclease III. As noted above, such a mutation can be detected by hybridization 
30 techniques, by PCR. or biochemically. 



25 
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Individuals who are found to have decreased levels of DNA repair activity mediated by 
endonuclease III can be treated by limiting exposure to radiation sources, using greater 
protection (such as sunscreen or radiation screens), monitored for neoplasms for early 
intervention, or treated to mcrease the level of endonuclease III activity as described above. 

5 The present invention naturally contemplates several means for preparation of the DNA 
repair modulator(s) or endonuclease III, including as illustrated herein known recombinant 
techniques, and the invention is accordingly intended to cover such synthetic preparations 
within its scope. The isolation of the cDNA and amino acid sequences disclosed herein 
facilitates the reproduction of the DNA repair moduiator(s) by such recombinant techniques, 
10 and accordingly, the invention extends to expression vectors prepared from the disclosed 
DNA sequences for expression in host systems by recombinant DNA techniques, and to the 
resulting transformed hosts. 

The invention includes an assay system for screening of potential drugs effective to modulate 
DNA repair modulator or endonuclease 111 activity of target mammalian cells by interrupting 

1 5 or potentiating the DNA repair modulator or endonuclease III . In one instance, the test drug 
could be administered to a cellular sample with the ligand that activates the DNA repair 
modulator or endonuclease III, or an extract containing the activated DNA repair modulator 
or endonuclease III. to determine its effect upon the binding activity of the DNA repair 
modulator or endonuclease 111 to any chemical sample (including DNA), or to the test drug, 

20 by comparison with a control. 

The assay system could more importantly be adapted to identify drugs or other entities that 
are capable of binding to the DNA repair modulator or endonuclease III, and/or DNA repair 
modulator or endonuclease 111 factors or proteins, either in the cytoplasm or in the nucleus, 
thereby inhibiting or potentiating DNA repair modulator or endonuclease III activity. Such 
25 assay would be useful in the development of drugs that would be specific against particular 
cellular activity, or that would potentiate such activity, in time or in level of activity. 

In yet a further embodiment the invention contemplates antagonists of the activity of a DNA 
repair modulator or endonuclease III, and in particular, an agent or molecule that inhibits 
DNA repair modulator or endonuclease 111. In a specific embodiment, the antagonist can be 
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a pept.de hav.ng the sequence of a portion of an active domain of a DNA repair modulator or 

endonuclease III . 



The present invention likewise extends to the development of antibodies against the DNA 
repair modulator(s) or endonuclease III. including naturally raised and recombinantly 
prepared antibodies. For example, the antibodies could be used to screen expression libraries 
to obtain the gene or genes that encode the DNA repair modulator or endonuclease III. Such 
antibodies could include both polyclonal and monoclonal antibodies prepared by known 
genetic techniques, as well as b.-specif,c (chimeric) antibodies, and antibodies including 
other functionalities suiting them for additional diagnostic use conjunctive w.th their 
capability of modulating DNA repair modulator or endonuclease III activity. 



Thus, the DNA repair modulator(s) or endonuclease III, their analogs and/or analogs, and any 
antagonists or antibodies that may be raised thereto, are capable of use in connection with 
various diagnostic techniques, including immunoassays, such as a radioimmunoassay, using 
for example, an antibody to the DNA repair modulator or endonuclease III that has been 
1 5 labeled by either radioactive addition, or radioiodination. 

In an immunoassay, a control quantity of the antagonists or antibodies thereto, or the like 
may be prepared and labeled with an enzyme, a specific binding partner and/or a radioactive 
element, and may then be introduced into a cellular sample. After the labeled material or its 
binding partnerCs) has had an opportunity to react with sites w.thin the sample, the resulting 
mass may be examined by known techniques, which may vary with the nature of the label 



attached. 



25 



In the instance where a radioactive label, such as the isotopes 'H, "C, »P. "S. "CI. "Cr. "Co 
"Co. "Fe, "Y, '^'1, "'I, and "^Re are used, known curtently available counting procedures 
may be utilized. In the instance where the label is an enzyme, detection may be 
accomplished by any of the presently utilized colorimetric. spectrophotometric, 
fluorospectrophotometric. amperometric or gasometric techniques known in the art. 

The present invention includes an assay system which may be prepared m the fonn of a test 
kit for the quantitative analysis of the extent of the presence of the DNA repair modulator o^ 
endonuclease III. or to identify drugs or other agents that may mimic or block their activity 
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The system or test kit may comprise a labeled component prepared by one of the radioactive 
and/or enzymatic techniques discussed herein, coupling a label to the DNA repair modulator 
or endonuciease [IK their agonists and/or antagonists, and one or more additional 
immunochemical reagents, at least one of which is a free or immobilized ligand. capable 
5 either of binding with the labeled component, its binding panner, one of the components to 
be determined or their binding partner(s). 

The present invention represents a significant advance, as the presence and activity of 
mammalian endonuciease HI has, at best been postulated. By providing mammalian 
endonuciease III, the present invention opens an avenue for repairing DNA damaged by 

10 irradiation or oxidation, thus avoiding transformation of damaged tissues and development of 
cancers. This advance has imponant implications for treating or preventing skin cancers, and 
cancers associated with gamma irradiation, including those resulting from exposure to 
radiation from high altitude aviation, nuclear medicine, nuclear reactors, or nuclear weapons. 
It is a particular advantage that the present invention provides a human endonuciease III 

1 5 polypeptide, nucleic acids encoding the polypeptide, and associated therapeutic and 
diagnostic methods. 

Accordingly, it is a principal object of the present invention to provide a DNA repair 
modulator or mammalian endonuciease III and its subunits in purified form that exhibits 
certain characteristics and activities associated with the enzyme endonuciease III. 

20 It is a further object of the present invention to provide antibodies to the DNA repair 

modulator or endonuciease III and its subunits, and methods for their preparation, including 
recombinant means. 

It is a further object of the present invention to provide a method for delecting the presence, 
amount and activity of the DNA repair modulator or endonuciease III and its subunits in 
25 mammals in which invasive, spontaneous, or idiopathic pathological states are suspected to 
be present. 

It is a further object of the present invention to provide a method and associated assay system 
for screening substances such as drugs, agents and the like, potentially effective in either 
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mimicking the activity or combating the adverse effects of the DNA repair modulator or 
endonuclease III. andor its subunits in mammals. 

It is a still further object of the present invention to provide a method for the treatment of 
mammals to control the amount or activity of the DNA repair modulator or endonuclease III 
5 or subunits thereof, so as to alter the adverse consequences of such presence or activity, or 
where beneficial, to enhance such activity. 

It is a still ftirther object of the present invention to provide a method for the treatment of 
mammals to control the amount or activity of the DNA repair modulator or endonuclease III 
or its subunits, so as to treat or avert the adverse consequences of invasive, spontaneous or 
1 0 idiopathic pathological states. 

It is a still funher object of the present invention to provide pharmaceutical compositions for 
use in therapeutic methods which comprise or ar« based upon the DNA repair modulator or 
endonuclease III, its subunits, their binding partners), or upon agents or drugs that control 
the production, or that mimic or antagonize the activities of the DNA repair modulator or 

15 endonuclease III. 



Other objects and advantages will become apparent to those skilled in the art from a review 
of the ensuing description which proceeds with reference to the following illustrative 

drawings. 

BRJEF PFSCRIPTlON oftH F^ DRAWiNn^ 

20 FIGURE 1 SDS-PAGE analysis of the purification fractions. Lanes I and 7 contain 

molecular weight markers. Lanes 2-5 contain Fractions MV respectively (described in the 
text and Table 1). Lane 6 contains material from ssDNA cellulose column fraction 17. which 
was then pooled with fractions 15 and 16 to yield Fraction V. 



25 



FIGURE 2A. Coelution of pyrimidine hydrate and thymine glycol DNA-glycosylase 
activities from the ssDNA cellulose column. Fractions from the ssDNA cellulose column 
were assayed simultaneously for both enzyme activities. Activities were normalized by 
dividing the activity of each fraction by the activity of the fraction with maximum activity. 
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FIGURE 2B. Coelution of pyrimidine hydrate DNA glycosyiase activity and AP lyase 
activity from the ssDNA cellulose column. A second calf thymus preparation was purified 
through ssDNA cellulose, and elution fractions analyzed for both enzyme activities, 
normalized as in Figure 2A. 

5 FIGURE 2C. SDS-PAGE analysis of ssDNA cellulose elution fractions (Fraction V). A 25 
uL aliquot from each of the indicated fractions shown in 2A was analyzed by SDS-PAGE. 
Fractions 14-18 contain the predominant 3 1 kD species. The extreme left and right lanes 
contain molecular weight markers. 

FIGURE 3A. SDS-PAGE analysis of £. coii endonuclease III and the bovine enzyme after 
1 0 incubation with the thymine glycol-containing oligodeoxynucleotide and NaCNBH3. Lanes I 
and 10 contain molecular weight markers. Lane 2 contains the product of the reaction of 
coll endonuclease III and NaCNBH, Lane 3 contains the product of the same reaction 
mixture as Lane 2 w ith addition of duplex 5*-^^P labeled oligodeoxynucleotide containing a 
single thymine glycol (TG) residue. Lane 4 contains the product of fraction 1 7 eluted from 
15 the ssDNA cellulose column and incubated with NaCNBH3 but no oligodeoxynucleotide. 
Lane 5 contains the product of elution fraction 17 incubated with the 5'-^^?- 
oligodeoxynucleoiide but no NaCNBH3 Lane 6 contains the product of elution fraction 
17 incubated with both the oligodeoxynucleotide and NaCNBH3. Lane 7 is the same mixture 
as 6 except that the complementary (non-thymine glycol-containing) oligodeoxynucleotide 
20 was 5' labeled with ^-P. Lanes 8 and 9 contain the products of the incubation of ssDNA- 
cellulose fraction 8, which did not exhibit enzymatic activity, alone or with 
oligodeoxynucleotide and NaCNBHj respectively. 

FIGURE 3B. Phosphorimage of the SDS-PAGE gel of Figure 3A. The lanes are identical to 
those of Panel A. 

25 FIGURE 4. Amino acid sequence alignment of £. coli endonuclease III (Line A), C elegans 
translated protein (Line B), bovine primary amino acid sequences for peptides of 15, 23, 14, 
and 22 amino acids respectively (Line C), K sapiens and Raaus sp. sequences obtained by 
translation of partial cDNA sequence (Line D and E. respectively ). X in sequence C 
represents an indeterminate amino acid residue. in sequences D and E represents 
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indetermmate nucleotide sequences. The 6 amino acid reg.on presented in boldface and italic 
constitutes a portion of the active site of endonuclease III. A 22 amino acid region of near 
identity betv^een the predicted C. elegans sequence, the primary bovine sequence, and the K 
sapiens and Rattus sp. translated panial cDNA sequences is presented in boldface. The 4 
■> cysteine residues presented in double underlined type represent the ligands of the iron-sulfor 
cluster of £. colt endonuclease III. 

FIGURE 5 is a cDNA encoding essentially full length human endonuclease HI (encoding 
amino acids 8-304 of SEQ ID NO:2) prepared and identified in accordance with the present 
invention. 



10 



15 



FIGURE 6 is the coding sequence (open reading frame) of human endonuclease III (amino 
acids 8-304 of SEQ FD NO:2) prepared and .demifled in accordance with the present 
invention. 



FIGURE 7 is the amino acid sequence prepared from the translation of the cDNA of 
FIGURE 5. 

FIGURE 8. Nucleotide and deduced amino acid sequence of the human pyrimidine 
hydrate-thymine glycol DNA glycosylase/AP lyase. The sequences of peptides obtained by 
proteolytic digestion of purified bovine pyrimidine hydrate-thymine glycol DNA 
glycosylase/AP lyase are in Holies and are aligned with the homologous human amino acid 
sequence. 

20 FIGURE 9. Northerr, blot analysis. Northern blot analysis was performed against I ug of 
mRNA from human spleen (Une 2) and 2 ug of mRNA from human 293T cells (Lane 3) 
usmg the full length «P-labeIed cDNA for the human pyrimidine hydrate-thymine glycol 
DNA glycosylase/AP lyase as a probe. Methylene blue-stained RNA markers are shown in 

Lane I. 

25 FIGURE 1 0. Expression and purification of the recombinant human pyrimidine 

hydrate-thymine glycol DNA glycosylase/AP lyase. SDS-PAGE analysis of the GST fxision 
protem. Lane 2 is total SDS lysate from uninduced E. call containing the pGEX-2T vector 
Lane 3 is total SDS lysate of the same E. coli after induction by FPTG for 5 h. Lane 4 is the 
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soluble fraction obtained by centrifugation of induced E, coli disrupted by sonication. Lane 5 
is the purified GST protein after elution from glutathione agarose affinity media. Lane 6 is 
the total SDS lysate from uninduced E. coli containing the pGEX-2T vector into which the 
sequence encoding the human enzyme had been cloned. Lane 7 is total SDS lysate from 
5 induced £. coli. contaming the recombinant pGEX-2T vector. Lane 8 is the soluble fraction 
of induced disrupted E. coli containing the recombinant pGEX-2T vector. Lane 9 is the 
purified GST fusion protein after elution from affinity media. Lanes I and 10 are MW 
markers. 

FIGURE 1 L SDS-PAGE analysis of E. coli endonuclease III and the human GST ftision 
1 0 protein after incubation with the thymine glycol-containing oligodeoxynucleotide and 

NaCNBHj. A, Lane I contains MW markers. Lane 2 contains the product of the incubation 
off. coli endonuclease III with NaCNBH, Lane 3 contains the product of the same 
incubation mixture as Lane 2 with addition of duplex 5*-"P labeled 
oligodeoxynucleotide containing a single thymine glycol residue. Lane 4 contains the 
1 5 product of the incubation the purified non-fusion GST protein (Fig. 3, Lane 5) with 

NaCNBH, but no oligodeoxynucleotide. Lane 5 contains the product of the incubation of the 
same purified non-fusion GST protein with NaCNBH3 and the 5*-^-P-labeled 
oligodeoxynucleotide. Lanes 6 and 7 contain the products of the incubation of the purified 
GST fusion protein (Fig. 3, Lane 9) with NaCNBH, alone, or NaCNBH^and 
20 oligodeoxynucleotide respectively. D, Phosphorimage of the SDS-PAGE gel of Fig. 4^. The 
lanes are identical to those described in A. The MW in lane I are not radiolabeled, but are the 
same Coomassie-stained markers shown in Fig. 4^4. 

FIGURE 1 2. V vs. [EJ plot. Amount of thymine glycol released after incubation of oxidized 
alternating poly (dA-dT) for 20 min with recombinant protein. The points represent the 
25 average of 2 determinations. There was less than 5% variability among duplicate samples. 

FIGURE 13. Spectroscopic analysis. Figure 1 3 A, Optical absorption spectrum of the 
purified human pyrimidine hydrate-thymine glycol DNA glycosylase/AP lyase-GST fusion 
protein. Figure 13B, Optical absorption spectrum of the purified non-fusion GST protein of 
S. japonicum. 
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FIGURE 14. Histogram of FISH analysis results. Ten mitotic figures to which the FISH 
probe had bound were analyzed to determine the prec.se position of the gene on chromosome 
16. Each dot represents the position of the human gene as determined through one such 



analysis. 



5 FIGURE 1 5. Alignment of the amino acid sequence of £. coli endonuclease III with those of 
putatively homologous proteins from 3 evolutionary domains. The amino acid sequence off. 
coli endonuclease HI (Eco) is aligned with homologous sequences from H. influenza, (Hin) 
B. subni^s (Bsu), M jannascMi (Mja), 5. pombe (Spo). C. elegans (Cel). H sap.ens (Hsa), Is 
well as two unique homologous sequences from S. cerev,siae (See and See non Fe-S). 

0 Residues in black boxes indicate identical sequences. Residues in gray boxes indicate 
conservative substitution. Dashes denote gaps in sequence introduced to maximize 
alignment. Numbers in the left hand column refer to the first amino acid residue in each line 
of the respective protein sequences. Nun.bers in the lower right hand indicate the total 
number of amino acid residues in each protein sequence. In archeons and eukaryotes the 
5 proteins which are homologous to £. coli endonuclease III have unique extensions at their 
N-and/or C-termini. For the sake of clarity these extensions have been omined from the 
figure. Alignmem of residues 83-304 of the human enzyme with residues 2-209 of the £. coli 
enzyme demonstrates that there is 29.3 % identity and 5 1 .90/0 similarity between the 2 
proteins. 



DETAII.Fn np sCRIPTION 



In accordance with the present invention there may be employed conventional molecular 
biology, microbiology, and recombinant DNA techniques within the skill of the art. Such 
techniques are explained fully in the literature. See. e.g.. Sambrook et al. "Molecular 
Cloning: A Laboratory Manual" ( 1 989); "Current Protocols in Molecular Biology" Volumes 
Mil (Ausubel. R. M.. ed. (1994)]; "Cell Biology: A Laboratory Handbook" Volumes I-III [J 
E. Cehs. ed. (1994))]; "Current Protocols in Immunology" Volumes I-III [Coligan J E ed 
(1994)]; "Oligonucleotide Synthesis" (M.J. Gait ed. 1984); "Nucleic Acid Hybridization" 
[B.D. Hames & S.J. Higgins eds. ( 1 985)]; "Transcription And Translation" (B.D Hames & 
SJ. Higgins. eds. (1984)]; "Animal Cell Culture" [R.I. Freshney. ed. (.986)]; "Immobilized 
Cell. And Enzymes" [IRL Press, (1986)]; B. Perbal. "A Practical Guide To Molecular 
Cloning" (1984). 
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Therefore, if appearing herein, the following terms shall have the definitions set out below. 

The terms "DNA repair modulator", "endonuclease IIP, "DNA glycosylase/AP lyase" and 
any variants not specifically listed, may be used herein interchangeably, and as used 
5 throughout the present application and claims refer to proicinaceous material including single 
or multiple proteins, and extends to those proteins having the amino acid sequence data 
described herein and presented in FIGURE 7, and the profile of activities set forth herein and 
in the Claims. Accordingly, proteins displaying substantially equivalent or altered activity 
are likewise contemplated. These modifications may be deliberate, for example, such as 
1 0 modifications obtained through site-directed mutagenesis, or may be accidental, such as 
those obtained through mutations in hosts that are producers of the complex or its named 
subunits. Also, the terms "DNA repair modulator", "endonuclease III" and "DNA 
glycosylase/AP lyase" are intended to include within their scope proteins specifically recited 
herein as well as all substantially homologous analogs and allelic variations. 

15 

The amino acid residues described herein are preferred to be in the **L" isomeric form. 
However, residues in the "D" isomeric form can be substituted for any L-amino acid residue, 
as long as the desired fuctional property of immunoglobulin-binding is retained by the 
polypeptide. NH, refers to the free amino group present at the amino terminus of a 
20 polypeptide. COOH refers to the free carboxy group present at the carboxy terminus of a 

polypeptide. In keeping with standard polypeptide nomenclature, 7. Biol. Chem.. 243:3552- 
59 (1969), abbreviations for amino acid residues are shown in the following Table of 
Correspondence: 

TABLE OF CORRFSPONDPNrP 

25 SYMBOIr AMINO ACin 



l-LQtt?r 






Y 


Tyr 


tyrosine 


G 


Gly 


glycine 


F 


Phe 


phenylalanine 


M 


Met 


methionine 


A 


Ala 


alanine 


S 


Ser 


serine 


I 


lie 


isoleucine 
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20 



19 

^ Leu leucine 

^ "Hir thregnine 

Val valine 

^ Pro proline 

^ Lys lysine 

^ His histidine 

Q Gin glutamine 

^ Glu glutamic acid 

^nP tryptophan 



W 



Arg arginine 



D 



Asn 

C Cys 



Asp aspartic acid 

asparagine 



cysteine 



It should be noted that all amino-acid residue sequences are represented herein by formulae 
whose left and right orientation is in the conventional direction of amino-temiinus to 
carboxy-terminus. Furthermore, it should be noted that a dash at the beginning or end of an 
amino acid residue sequence indicates a peptide bond to a further sequence of one or more 
amino-acid residues. The above Table is presented to correlate the three-letter and one-letter 
notations which may appear alternately herein. 

A "replicon" is any genetic element (e.g., plasmid, chromosome, virus) that functions as an 
autonomous unit of DNA replication in v/vo; i.e., capable of replication under its own 
controJ. 



A "vector" is a replicon, such as plasmid, phage or cosmid. to which another DNA segment 
may be attached so as to bring about the replication of the attached segment. 

25 A "DNA molecule" refers to the polymeric form of deoxyribonucleotides (adenine, guanine, 
thymine, or cytosine) in its either single stranded form, or a double-stranded helix. This term 
refers only to the primary and secondary structure of the molecule, and does not limit it to 
any particular tertiary forms. Thus, this term includes double-stranded DNA found. ,mer 
alia, in linear DNA molecules (e.g., restriction fragments), viruses, plasm.ds. and 

30 chromosomes. In discussing the structure of particular double-stranded DNA molecules. 
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sequences may be described herein according to the normal convention of giving only the 
sequence in the 5' to 3' direction along the nontr^nscribed strand of DNA (i.e., the strand 
having a sequence homologous to the mRNA). 

An "origin of replication" refers to those DNA sequences that participate in DNA synthesis. 

5 A DNA "coding sequence" is a double-stranded DNA sequence which is transcribed and 
translated into a polypeptide in vivo when placed under the control of appropriate regulatory 
sequences. The boundaries of the coding sequence are determined by a start codon at the 5' 
(amino) terminus and a translation stop codon at the 3* (carboxyl) terminus. A coding 
sequence can include, but is not limited to, prokaryotic sequences, cDNA from eukaryotic 
10 mRNA, genomic DNA sequences from eukaryotic (e.g., mammalian) DNA, and even 

synthetic DNA sequences. A polyadenylation signal and transcription termination sequence 
will usually be located 3' to the coding sequence. 

Transcriptional and translational control sequences are DNA regulatory sequences, such as 
promoters, enhancers, polyadenylation signals, terminators, and the like, that provide for the 
1 5 expression of a coding sequence in a host cell. 

A "promoter sequence" is a DNA regulatory region capable of binding RNA polymerase in a 
cell and initiating transcription of a downstream (3* direction) coding sequence. For purposes 
of defming the present invention, the promoter sequence is bounded at its 3* terminus by the 
transcription initiation site and extends upstream (5* direction) to include the minimum 

20 number of bases or elements necessary to initiate transcription at levels detectable above 
background. Within the promoter sequence will be found a transcription initiation site 
(conveniently defmed by mapping with nuclease SI), as well as protein binding domains 
(consensus sequences) responsible for the binding of RNA polymerase. Eukaryotic 
promoters will often, but not always, contain "TATA" boxes and "CAT" boxes. Prokaryotic 

25 promoters contain Shine- Dalgamo sequences in addition to the -10 and -35 consensus 
sequences. 

An "expression control sequence" is a DNA sequence that controls and regulates the 
transcription and translation of another DNA sequence. A coding sequence is "under the 
control" of transcriptional and translational control sequences in a cell when RNA 
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polymerase transcribes the cod.ng sequence into mRNA. which is then translated into die 
protein encoded by the coding sequence. 

A • signal sequence" can be included before the coding sequence. This sequence encodes a 
signal pept.de. N-term.nal to the polypeptide, that communicates to the host cell to direct the 
polypeptide to the cell surface or secrete the polypeptide into the media, and this signal 
peptide is clipped off by the host cell before the protein leaves the cell. Signal sequences can 
be found associated with a variety of proteins native to prokaryotes and eukaryotes. 

The term "oligonucleotide," as used herein in referrmg to the probe of the present invention, 
is defined as a molecule comprised of two or more ribonucleotides, preferably more than 
three. Its exact size will depend upon many factors which, in turn, depend upon the ultimate 
function and use of the oligonucleotide. 

The term "primer" as used herein refers to an oligonucleotide, whether occurring naturally as 
in a purified restriction digest or produced synthetically, which is capable of acting as a point 
of initiation of synthesis when placed under conditions in which synthesis of a primer 
extension product, which is complementary to a nucleic acid strand, is induced, i.e., in the 
presence of nucleotides and an inducing agent such as a DNA polymerase and at a suitable 
temperature and pH. The primer may be either smgle-stranded or double-stranded and must 
be sufficiently long to prime the synthesis of the desired extension product m the presence of 
the inducing agent. The exact length of the primer will depend upon many factors, including 
temperature, source of primer and use of the method. For example, for diagnostic 
applications, depending on the complexity of the target sequence, the oligonucleotide primer 
typically contains 1 5-25 or more nucleotides, although it may contain fewer nucleotides. 

The primers herein are selected to be "substantially" complementary to different strands of a 
particular target DNA sequence. This means that the primers must be sufficientiv 
complementary to hybridize with their respective strands. Therefore, the primer sequence 
need not reflect the exact sequence of the template. For example, a non-complementary 
nucleotide fragment may be atuched to the 5' end of the primer, with the remainder of the 
primer sequence being complementary to the strand. Alternatively, non-complementary 
bases or longer sequences can be interspersed into the primer, provided that the primer 
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sequence has sufficient complementarity with the sequence of the strand to hybridize 
therewith and thereby form the template for the synthesis of the extension product. 

As used herein, the terms "restriction endonucleases" and "restriction enzymes" refer to 
bacterial enzymes, each of which cut double-stranded DNA at or near a specific nucleotide 
5 sequence. 

A cell has been "transformed" by exogenous or heterologous DNA when such DNA has been 
introduced inside the cell. The transforming DNA may or may not be integrated (covalcntly 
linked) into chromosomal DNA making up the genome of the cell. In prokaryotes, yeast, and 
mammalian cells for example, the transforming DNA may be maintained on an episomal 

1 0 element such as a plasmid. With respect to eukaryotic cells, a stably transformed cell is one 
in which the transforming DNA has become integrated into a chromosome so that it is 
inherited by daughter cells through chromosome replication. This stability is demonstrated 
by the ability of the eukaryotic cell to establish cell lines or clones comprised of a population 
of daughter cells containing the transforming DNA. A "clone" is a population of cells 

1 5 derived from a single cell or common ancestor by mitosis. A "cell line" is a clone of a 
primary cell that is capable of stable growth in vitro for many generations. 

A nucleic acid molecule is "hybridizable" to another nucleic acid molecule, such as a cDNA, 
genomic DNA. or RNA. when a single stranded form of the nucleic acid molecule can anneal 
to the other nucleic acid molecule under the appropriate conditions of temperature and 

20 solution ionic strength {see Sambrook et al., supra). The conditions of temperature and ionic 
strength determine the "stringency" of the hybridization. For preliminary screening for 
homologous nucleic acids, low stringency hybridization conditions, corresponding to a T„ of 
55 ^ can be used, e.g., 5x SSC, 0.1% SDS, 0.25% milk, and no formamide: or 30% 
formamide. 5x SSC. 0.5% SDS). Moderate stringency hybridization conditions correspond 

25 to a higher T„, e.g.. 40% formamide, with 5x or 6x SCC. High stringency hybridization 

conditions correspond to the highest T„, e.g., 50% formamide, 5x or 6x SCC. Hybridization 
requires that the two nucleic acids contain complementary sequences, although depending on 
the stringency of the hybridization, mismatches between bases are possible. The appropriate 
stringency for hybridizing nucleic acids depends on the length of the nucleic acids and the 

30 degree of complementation, variables well known in the art. The greater the degree of 

similarity or homology between two nucleotide sequences, the greater the value of T„ for 
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hybrids of nucleic acds having those sequences. The relative stability (corresponding to 
higher T„) of nucleic acid hybridizations decreases in the following order: RNA RNA 
DNA.RNA. DNA.DNA. For hybrids of greater than 1 00 nucleotides in length, equations for 
calculating T„ have been derived (see Sambrook et a!., supra, 9.50-0.5 1 ). For hvbridization 
with shorter nucleic acids, oligonucleotides, the position of mismatches becomes more 
.mportant, and the length of the oligonucleotide determines its specificity Uee Sambrook et 
al., supra, 11.7-11 .8). Preferably a minimum length for a hybridizable nucleic acid is at least 
about 1 0 nucleotides: preferably at least about 1 5 nucleotides: and more preferably the length 
IS at least about 20 nucleotides; and most preferably 30 nucleotides. 

\n a specific embodiment, the term "standard hybridization conditions" refers to a T„ of 
55-C, and utilizes conditions as set forth above. In a preferred embodiment, the tJs 60-C; 
in a more preferred embodiment, the T„ is 65'*C. 

••Homologous recombination" refers to the insertion of a foreign DNA sequence of a vector 
.n a chromosome. Preferably, the vector targets a specific chromosomal site for homologous 
1 ^ recombination. For specific homologous recombination, the vector will contain sufficiemlv 
long regions of homology to sequences of the chromosome to allow complementary binding 
and incorporation of the vector into the chromosome. Longer regions of homology, and 
greater degrees of sequence similarity, may increase the efficiency of homologous 
recombination. 



10 



20 Accordingly, the term "sequence similarity" in all its grammatical forms refers to the degree 
of Identity or correspondence between nucleic acid or amino acid sequences of proteins that 
do not share a common evolutionary origin isee Reeck et al., supra). However, in common 
usage and in the instant application, the term "homologous," when modified with an adverb 
such as "highly," may refer to sequence similarity and not a common evolutionary origin. 

25 In a specific embodiment, two DNA sequences are "substantially homologous" or 

"substantially similar" when at least about 50% (preferably at least about 75o/o. and most 
preferably at least about 90 or 95%) of the nucleotides match over the defined length of the 
DNA sequences. Sequences that are substantially homologous can be identified by 
comparing the sequences using standard software available in sequence data banks, or in a 
Southern hybridization experiment under, for example, stringent conditions as defined for 
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that panicular system. Defining appropriate hybridization conditions is within the skill of the 
an. See, e.g., Maniatis et al., supra\ DNA Clonihg, Vols. I & 11, supra\ Nucleic Acid 
Hybridization, supra. 

Similarly, m a particular embodiment, two amino acid sequences are "substantially 
5 homologous" or "substantially similar" when greater than 30% of the amino acids are 
identical, or greater than about 60% are similar (functionally identical). Preferably, the 
similar or homologous sequences are identified by alignment using, for example, the GCG 
(Genetics Computer Group, Program Manual for the GCG Package, Version 7, Madison, 
Wisconsin) pileup program. 



10 The term "corresponding to" is used herein to refer similar or homologous sequences, 

whether the exact position is identical or different from the molecule to which the similarity 
or homology is measured. Thus, the term "corresponding to" refers to the sequence 
similarity, and not the numbering of the amino acid residues or nucleotide bases. 



Genes Encoding Endnn uclease FIT Prntein ^f 

1 5 As illustrated in the Examples herein, the present invention includes the isolation of a gene 
encoding an endonuclease III of the invention, including a full length, or naturally occurring 
form of endonuclease III. and any antigenic fragments thereof from any animal, particularly 
mammalian, and more particularly human, source. As used herein, the term "gene" refers to 
an assembly of nucleotides that encode a polypeptide, and includes cDNA and genomic DNA 

20 nucleic acids. 



A gene encoding endonuclease III, whether genomic DNA or cDNA, can be isolated from 
any source, particularly from a human cDNA or genomic library. Methods for obtaining 
endonuclease III gene are well known in the an, as described above {see, e.^., Sambrook et 
ah, 1989, 5wpra). 

25 Accordingly, any animal cell potentially can serve as the nucleic acid source for the 

molecular cloning of an endonuclease III gene. The DNA may be obtained by standard 
procedures known in the art from cloned DNA {e.g., a DNA "library"), and preferably is 
obtained from a cDNA library prepared from tissues with high level expression of the protein 
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(e.g.. a spleen cDNA library, as described herein), by chemical synthesis, by cDNA cloning, 
or by the cloning of genomic DNA. or fragments thereof, purified from the desired cell (See. 
for example. Sambrook et aL, 1989, supra; Glover. D.M. (ed ). 1985. DNA Cloning: A 
Practical Approach. MRL Press. Ltd.. O.xford, U K. Vol. I. 11). Clones derived from genomic 
DNA may contain regulatory and intron DNA regions in addition to coding regions; clones 
derived from cDNA will not contain mtron sequences. Whatever the source, the gene should 
be molecularly cloned into a suitable vector for propagation of the gene 

In the molecular cloning of the gene from genomic DNA, DNA fragments are generated, 
some of which will encode the desired gene. The DNA may be cleaved at specific sites using 
various restriction enzymes. Alternatively, one may use DNAse in the presence of 
manganese ,o fragment the DNA. or the DNA can be physically sheared, as for example, by 
sonication. The linear DNA fragments can then be separated according to size by standard 
techniques, including but not limited to, agarose and polyacrylam.de gel electrophoresis and 
column chromatography. 

Once the DNA fragments are generated, identification of the specific DNA fragment 
containing the desired endonuciease III gene may be accomplished in a number of ways. For 
example, if an amount of a portion of an endonuc/ease III gene or its specific RNA, or a 
fragment thereof, is available and can be purified and labeled, the generated DNA fragments 
may be screened by nucleic acid hybridization to a labeled probe [Benton and Davis. Science. 
196:180 (1977): Grunstein and Hogness. Proc. Natl. Acad. Sci. U.S.A.. 72:3961 (1975)]. 
For example, a set of oligonucleotides corresponding to the partial amino acid sequence 
information obtained for the endonuclease III protein can be prepared and used as probes for 
DNA encoding endonuclease III. or as primers for cDNA or mRNA {e.g., in combination 
with a poly-T primer for RT-PCR). Preferably, a fragment is selected that is highly unique to 
endonuclease III of the invention. Those DNA fragments with substantial homology to the 
probe will hybridize. As noted above, the greater the degree of homology, the more stringent 
hybridization conditions can be used. In a specific embodiment, stringency hybridization 
conditions are used to identify a homologous endonuclease ///gene. 

An endonuclease III gene of the invention can also be identified by mRNA selection, i.e., bv 
nucleic acid hybridization followed by in vitro translation. In this procedure, nucleotide 
fragments are used to isolate complementary mRNAs by hybridization. Such DNA 
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fragments may represent available, purified endonuclease III DNA, or may be synthetic 
oligonucleotides designed from the partial amino acid sequence information. 
Immunoprecipitation analysis or functional assays (thymine glycoi-DNA glycosylase 
activity) of the in vitro translation products of the products of the isolated mRNAs identifies 
5 the mRNA and, therefore, the complementary DNA fragments, that contain the desired 
sequences. In addition, specific mRNAs may be selected by adsorption of polysomes 
isolated from cells to immobilized antibodies specifically directed against endonuclease III, 
such as the rabbit polyclonal anti-murine endonuclease HI antibody described herein. 

A radiolabeled endonuclease III cDNA can be synthesized using the selected mRNA (from 
1 0 the adsorbed polysomes) as a template. The radiolabeled mRNA or cDNA may then be used 
as a probe to identify homologous endonuclease III DNA fragments from among other 
genomic DNA fragments. 

The present invention also relates to cloning vectors containing genes encoding analogs and 
derivatives of endonuclease III of the invention, that have the same or homologous functional 
1 5 activity as endonuclease III, and homologs thereof from other species. The production and 
use of derivatives and analogs related to endonuclease III are within the scope of the present 
invention. In a specific embodiment, the derivative or analog is functionally active, re., 
capable of exhibiting one or more functional activities associated with a full-length, wild- 
type endonuclease HI of the invention. 

20 Endonuclease III derivatives can be made by altering encoding nucleic acid sequences by 
substitutions, additions or deletions that provide for functionally equivalent molecules. 
Preferably, derivatives are made that have enhanced or increased functional activity relative 
to native endonuclease III. Alternatively, such derivatives may encode soluble fragments of 
endonuclease III extracellular domain that have the same or greater affinity for the natural 

25 ligand of endonuclease III of the invention. Such soluble derivatives may be potent 
inhibitors of ligand binding to endonuclease III. 

Due to the degeneracy of nucleotide coding sequences, other DNA sequences which encode 
substantially the same amino acid sequence as an endonuclease III gene may be used in the 
practice of the present invention. These include but are not limited to allelic genes, 
30 homologous genes from other species, and nucleotide sequences comprising all or portions of 
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endonuciease 111 genes which are altered by the substitution of different codons that encode 
the same amino acid residue within the sequerice. thus producing a silent change. Likewise, 
the endonuciease III derivatives of the invention mclude. but are not limited to, those 
containing, as a primary amino acid sequence, all or part of the amino acid sequence of an 
> endonuciease III protem including altered sequences in which functionally equivalent amino 
acid residues are substituted for residues within the sequence resulting in a conservative 
amino acid substitution. For e.xample, one or more amino acid residues within the sequence 
can be substituted by another amino acid of a similar polarity, which acts as a functional 
equivalent, resulting in a silent alteration. Substitutes for an amino acid within the sequence 
may be selected from other members of the class to which the amino acid belongs. For 
example, the nonpolar (hydrophobic) amino acids include alanine, leucine, isoleucine, valine, 
proline, phenylalanine, tryptophan and methionine. Amino acids containing aromatic ring 
structures are phenylalanine, tryptophan, and tyrosine. The polar neutral amino acids include 
glycine, serine, threonine, cysteine, tyrosine, asparagine. and glutamine. The positively 
1 5 charged (basic) amino acids include arginine. lysine and hisiidine. The negatively charged 
(acidic) amino acids include aspartic acid and glutamic acid. Such alterations will not be 
expected to affect apparent molecular weight as determined by polyacrylamide gel 
electrophoresis, or isoelectric point. 



10 



20 



Particularly preferred substitutions are: 

- Lys for Arg and vice versa such that a positive charge may be maintained: 

- Glu for Asp and vice versa such that a negative charge may be maintained; 

- Ser for Thr such that a free -OH can be maintained; and 

- Gin for Asn such that a free NH, can be maintained. 

Amino acid substitutions may also be introduced to substitute an amino acid with a 
25 particularly preferable property. For example, a Cys may be introduced at a potemial site for 
disulfide bridges with another Cys. A His may be introduced as a particularly "catalytic" site 
(I.e.. His can act as an acid or base and is the most common amino acid in biochemical 
catalysis). Pro may be introduced because of its panicularly planar structure, which induces 
P-tums in the protein's structure. 

30 TTie genes encoding endonuciease III derivatives and analogs of the invention can be 

produced by various methods known in the art. The manipulations which result in their 
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production can occur at the gene or protein level. For example, the cloned endonuclease III 
gene sequence can be modified by any ofnumerous strategies known in the art (Sambrook et 
aL, 1989, supra). The sequence can be cleaved at appropriate sites with restriction 
endonuclease(s). followed by further enzymatic modification if desired, isolated, and iigatcd 
5 in vitro. In the production of the gene encoding a derivative or analog of endonuclease III, 
care should be taken to ensure that the modified gene remains within the same transiational 
reading frame as the endonuclease III gene, uninterrupted by transiational stop signals, in the 
gene region where the desired activity is encoded. 

Additionally, the endonuclease Ill-encoding nucleic acid sequence can be mutated in vitro or 
1 0 in vivo, to create and/or destroy translation, initiation, and/or termination sequences, or to 
create variations in coding regions and/or form new restriction endonuclease sites or destroy 
preexisting ones, to facilitate further in vitro modification. Preferably, such mutations 
enhance the functional activity of the mutated endonuclease HI gene product. Any technique 
for mutagenesis known in the art can be used, including but not limited to. in vitro site- 
1 5 directed mutagenesis [Hutchinson, C, et aL J. BioL Chem.. 253:655 1 ( 1 978); Zoller and 
Smith, DNA. 3:479-488 (1984); Oliphant et aL, Gene, 44: 1 77 ( 1986); Hutchinson et aL, 
Proc, NatL Acad. Sci U.S.A.. 83:7 1 0 ( 1986)], use of TAB® linkers (Pharmacia), etc. PGR 
techniques are preferred for site directed mutagenesis (see Higuchi, 1989. "Using PGR to 
Engineer DNA", in PCR Technology: Principles and Applications for DNA Amplification. H. 
20 Erlich, ed., Stockton Press, Ghapter 6, pp. 61-70). 

The identified and isolated gene can then be inserted into an appropriate cloning vector. A 
large number of vector-host systems known in the art may be used. Possible vectors include, 
but are not limited to, piasmids or modified viruses, but the vector system must be 
compatible with the host ceil used. Examples of vectors include, but are not limited to, E. 

25 co//, bacteriophages such as lambda derivatives, or piasmids such as pBR322 derivatives or 
pUG plasmid derivatives, e.g., pGEX vectors, pmal-c, pFLAG, etc. The insertion into a 
cloning vector can, for example, be accomplished by ligaiing the DNA fragment into a 
cloning vector which has complementary cohesive termini. However, if the complementary 
restriction sites used to fragment the DNA are not present in the cloning vector, the ends of 

30 the DNA molecules may be enzymatically modified. Alternatively, any site desired may be 
produced by ligating nucleotide sequences (linkers) onto the DNA termini: these ligated 
linkers may comprise specific chemically synthesized oligonucleotides encoding restriction 
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endonuclease recogn.tion sequences. Recombinant molecules can be introduced into host 
cells v,a transformation, transfection, infection, electroporation. etc., so that many copies of 
the gene sequence are generated. Preferably, the cloned gene .s contained on a shuttle vector 
plasmid. which prov.des for expansion in a cloning cell, e.g., E. coli, and facile purification 
for subsequent insertion mto an appropriate expression cell line, if such is desired. For 
example, a shunle vector, which is a vector that can replicate in more than one type of 
organism, can be prepared for replication in both E. col, and Saccharomyces cerevisiae by 
linking sequences from an E coli plasmid with sequences from the yeast 2^ plasmid. 

In an alternative method, the dcs.red gene may be identified and isolated after insertion mto a 
10 suitable cloning vector in a "shot gun" approach. Enrichment for the desired gene, for 
example, by size fractionation, can be done before insertion into the cloning vector. 

Expression nf Fjn^oniir.f^^^yfy \\ \ Polypy p^,^^^ 

The nucleotide sequence coding for endonuclease III. or antigenic fragment, derivative or 
analog thereof, or a functionally active derivative, including a chimeric protein, thereof, can 
be msened into an appropriate expression vector. ,e.. a vector which contains the necessary 
elements for the transcription and translation of the inserted protein-coding sequence. Such 
elements are termed herein a "promoter." Thus, the nucleic acid encoding endonuclease III 
of the invention is operationally associated with a promoter in an expression vector of the 
invention. Both cDNA and genomic sequences can be cloned and expressed under control of 
20 such regulator' sequences. An expression vector also preferably includes a replication 
origin. 

The necessary transcriptional and translational signals can be provided on a recombinant 
expression vector, or they may be supplied by the native gene encoding endonuclease III 
and/or its flanking regions. 

25 Potential host-vector systems include but are not limited to mammalian cell systems infected 
with virus ie.g., vaccinia virus, adenovirus, etc.): insect cell systems infected with virus (e g. 
baculovirus); microorganisms such as yeast containing yeast vectors; or bacteria transformed 
with bacteriophage. DNA. plasmid DNA. or cosmid DNA. The expression elements of 
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vectors vary in their strengths and specificities. Depending on the host-vector system 
utilized, any one of a number of suitable transcription and translation elements may be used. 

A recombinant endonuclease III protein of the invention, or functional fragment, derivative, 
chimeric construct, or analog thereof, may be expressed chromosomally, after integration of 
5 the coding sequence by recombination. In this regard, any of a number of amplification 
systems may be used to achieve high levels of stable gene expression (See Sambrook et al., 
1989. supra). 



The cell containing the recombinant vector comprising the nucleic acid encoding 
endonuclease III is cultured in an appropriate cell culture medium under conditions that 
1 0 provide for expression of endonuclease III by the cell. 

Any of the methods previously described for the insertion of DNA fragments into a cloning 
vector may be used to construct expression vectors containing a gene consisting of 
appropriate transcriptional/translational control signals and the protein coding sequences. 
These methods may include in vitro recombinant DNA and synthetic techniques and in vivo 
1 5 recombination (genetic recombination). 

Expression of endonuclease III protein may be controlled by any promoter/enhancer element 
known in the an. but these regulaior>' elements must be functional in the host selected for 
expression. Promoters which may be used to control endonuclease III gene expression 
include, but are not limited to, the SV40 early promoter region (Benoist and Chambon, 1981, 

20 Nature 290:304-3 1 0), the promoter contained in the 3 ' long terminal repeat of Rous sarcoma 
virus (Yamamoto, et al., 1980, Cell 22:787-797), the herpes thymidine kinase promoter 
(Wagner et al., 1981, Proc. Natl. Acad. Sci. U.S.A. 78:1441-1445), the regulatory sequences 
of the metallothionein gene (Brinsier et al., 1982, Nature 296:39-42); prokaryotic expression 
vectors such as the p-lactamase promoter (Villa-Kamaroff, et al., 1978, Proc. Natl. Acad. Sci. 

25 U.S.A. 75:3727-373 I ), or the iac promoter (DeBoer, et al., 1983, Proc. Natl. Acad. Sci. 
U.S.A. 80:21-25): see also "Useful proteins from recombinant bacteria" in Scientific 
American, 1980, 242:74-94: promoter elements from yeast or other fungi such as the Gal 4 
promoter, the ADC (alcohol dehydrogenase) promoter. PGK (phosphoglycerol kinase) 
promoter, alkaline phosphatase promoter; and the animal transcriptional control regions, 

30 which exhibit tissue specificity and have been utilized in transgenic animals: elastase I gene 
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control reg.on which is active in pancreatic acmar cells (Swift et al., 1984. Cell 38 639-646 
Omuz etal.. ,986. Cold Spr.ng Harbor Syn,^. Q.ant. Biol. 50:399-409; MacDonald .987 
Hepatology 7:425-5 , 5 ): insulin gene control reg.on which is active in pancreatic beta cells' 
(Hanahan. 1985. Nature 3 1 5: 1,5- .22). immunoglobulin gene control reg.on which is act.ve 
.n .yn,phoid cells (Grosschedl et a... 1984, Cel. 38:647-658: Adames et a... .985 Nature 
. .8:533-.38; Alexander et al.. ,987. Mol. Cell. Biol. 7: ,436-1444). mouse mammary tumor 
v.rus control region which is act.ve .n testicular, breast, lympho.d and mast cells (Leder et 
al.. 1986. Cell 45:485-495). albumin gene control region which is active in liver (Pinkert et 
al.. .987. Genes and Devel. 1:268-276). alpha-fetoprotein gene control region which is active 
.n hver (Krumlauf e, a,.. 1985. Mol. Cell. Biol. 5:1639-1648; Hammer et al.. ,987, Science 
235:53-58). alpha l-antitorsin gene control region which is active in the .iver (Kelsey et al 
■987, Genes and Devel. .:l61-,7.), beta-g.obin gene contro. region which is active in 
mye.o.d cells (Mogram et al.. ,985. Nature 3 ,5:338-340; Kollias et a,.. ,986. Ce., 46 89-94) 
myehn bas.c prote.n gene contro. region which is active in oligodendrocyte cel.s in the bra.n 

(Readhead et a... 1987 Cell 48-703 7\->\ rr.,,^-- i- u l • 

o . ceil 4H. /Oj-7 1 2). myosm l.ght cha.n-2 gene control region which is 

acve in skeletal muscle (Sani. .985. Nature 3 ,4:283-286). and gonadotrop.c re.easing 
hormone gene contro. region which .s active in the hypothalamus (Mason et a,.. .986 
Science 234: .372-, 378). 
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Expression vectors containing a nucleic acid encoding an endonuCease 11, of the invention 
can be .dentifled by four genera, approaches: (a) PCR amplification of the des.red plasmid 
DNA or specific mRNA. (b) nucleic acid hybridization, (c) presence or absence of selection 
marker gene functions, and (d) expression of inserted sequences. In the first approach the 
nucleic acids can be amplified by PCR to provide for detection of the amplified product .n 
the second approach, the presence of a foreign gene insetted in an expression vector can be 
detected by nucleic acid hybridization using probes comprising sequences that are 
homologous to an inserted marker gene. ,n the third approach, the recombinant vector/host 
system can be identified and selected based upon the presence or absence of certain 
selection marker ' gene functions (e.g., P-ga,actosidase activity, thymidine kinase activity 
resisunce to antibiotics, transformation phenotype, occlusion body formation in baculovirus 
etc, caused by the insertion of foreign genes in the vector. ,n another example, if the nuCeic 
acid encoding endonuCease 1,1 is inserted within the -seiection marker" gene sequence of the 
vector, recombinants containing the endonuCease III insen can be identified by the absence 
of the endonuCease HI gene function. In the fourth approach, recombinant expression 
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vectors can be identified by assaying for the activity, biochemical, or immunological 
characteristics of the gene product expressed by the recombinant, provided that the expressed 
protein assumes a functionally active conformation. 

A vs^ide variety of host/ express ion vector combinations may be employed in expressing the 
5 DNA sequences of this invention. Useful expression vectors, for example, may consist of 
segments of chromosomal, non-chromosomal and synthetic DNA sequences. Suitable 
vectors include derivatives of SV40 and known bacterial plasmids, e.g., E. coii plasmids col 
El, pCRl. pBR322, pMa|.C2, pET, pGEX (Smith ei ai. 1988, Gene 67:3 1-40), pMB9 and 
their derivatives, plasmids such as RP4; phage DNAs, e.g., the numerous derivatives of 
1 0 phage A, e.g., NM989, and other phage DNA, e.g., M13 and filamentous single stranded 

phage DNA; yeast plasmids such as the 2^ plasmid or derivatives thereof; vectors useful in 
cukaryoiic cells, such as vectors useful in insect or mammalian cells; vectors derived from 
combinations of plasmids and phage DNAs, such as plasmids that have been modified to 
employ phage DNA or other expression control sequences; and the like. 

1 5 For example, in a baculovirus expression systems, both non-fusion transfer vectors, such as 
but not limited to pVL94l {BamWX cloning site; Summers), pVL1393 {BamH\ , SmaU XbaU 
EcoK \ . Not\. Xma\\\. Bgl\\, and Pstl cloning site; Invitrogen), pVLI392 {BgiW. Pst\. Noil, 
XmaUl. EcoRl, Xbal, Smal. and BamH 1 cloning site: Summers and Invitrogen), and 
pBiue^acin (BamH\. Bglll, Fstl^ Nco\^ and HindlU cloning site, with blue/white 

20 recombinant screening possible; Invitrogen), and fusion transfer vectors, such as but not 
limited to pAcTOO (BamHl and Kpnl cloning site, in which the BamH\ recognition site 
begins with the initiation codon; Summers), pAc70l and pAc702 (same as pAc700, with 
difTereni reading frames), pAc360 (BamHl cloning site 36 base pairs downstream of a 
polyhedrin initiation codon; Invitrogen( 195)), and pBiueBacHisA, B, C (three different 

25 reading frames, with BamHl, Bgni, PstU NcoU and Hindlll cloning site, an N-terminal 
peptide for ProBond purification, and blue/white recombinant screening of plaques; 
Invitrogen (220)) can be used. 

Mammalian expression vectors contemplated for use in the invention include vectors with 
mducible promoters, such as the dihydrofolate reductase (DHFR) promoter, e.g., any 
30 expression vector with a DHFR expression vector, or a D//F/?/methotrexate co-amplificaxion 
vector, such as pED (^5/1, Sail, Sbal, Smal, and £coRl cloning site, with the vector 
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expressing both the cloned gene and DHFR: see Kaufman, Current Protocols .n Molecular 
B,olo^, 16. 12 ( 1991). Altemat.vely. a giutamine synthetase/methionine sulfox.mine co- 
ampl.flcat.on vector, such as pEEI4 iH.ndlU, Xtal, Sn,aL Sbal, EcoRl. and BcH cloning site 
.n wh.ch the vector expresses glutamine synthase and the cloned gene: Celhech). In another ' 
^ embodiment, a vector that directs episomal expression under control of Epste.n Barr Virus 
(EBV) can be used, such as pREP4 (BamHl,Sf,l XhoU Nan, Nhel, HindllL NheX P^U and 
Kpnl clonmg site, constitutive RSV-LTR promoter, hygromyc.n selectable marker- 
Invtrogen), pCEP4 iBar^HUSf,, Xhol, No^. N.el HM. NHel, />v«n. and Kpnl cloning 
sue. const.tut.ve hCMV .mmediate early gene, hygromycin selectable marker; fnv.trogen) 
10 PMEP4(Kpni,P.ul.meUH:ncllll.Non,Xf,olSf,lBamHl cloning s.te, inducible 

methalloth.onein Ila gene promoter, hygromycin selectable marker: Invitroaen) pREPg 
(Ban,HUXhol, Norl. HinJlll NM, and Kpn^ cloning site, RSV-LTR promoter, h.stidinol 
selectable marker; Invitrogen). pREP9 (Kpnl, Nhel, HindlM. Nasi, Xhol, Sf.l. and BamHI 
clon.ng s,te, RSV-LTR promoter. G4 1 8 selectable marker; Invitrogen), and pEBVHis (RSV- 
LTR promoter, hygromycin selectable marker. N-terminal peptide purifiable via ProBond 
res.n and cleaved b> enterokinase; Invitrogen). Selectable mammalian expression vectors for 
use .n the .nvemion include pRc/CMV iHindlll, BsrXl Norl. Sbal, and Apal clonin. site 
G4 1 8 selecon; Inv.trogen). pRc/RS V (Hindm, Spel, BstXl. Norl, Xbal cloning site G4 1 8 
select.on; Invitrogen). and others. Vaccinia virus mammalian expression vectors (see 
0 Kaufman. .991, supra) for use according to the invention include but are not limited to 

PSCI I iSn,al clon.ng site. TK- and P-gal selection), pMJ60l (Sail, Smal. AJR. Narl BspMll 
Ban,HL ApaL NHel. Sacll, Kpnl, and Hindlll Coning site; TK- and P-gal selection), and 
pTKgptFlS (£coRJ. Psa, Sail, Accl. Hindll, Stal BamHI, and Hpa clon.ng site. TK or 
XPRT selection). 

Yeast expression systems can also be used according to the invention to express the 
endonuclease II. pro.e.n. For example, the non-fusion pYES2 vector iXtal. Sphl, Shol Ncl 

^coRI. S.X1. ^..Hl, 5.CL .:^„|. and //,>,^,, Cloning s^^^ 
pYESHisA. B, C (Xbal, SpHl, SHol, Noil, BstXl, EcoRl. BamHI, Sad, Kpnl. and Hln^U 
clonrng s.te. N-termma, peptide purified with ProBond resin and cleaved with enterokinase 
Invitrogen). to memion just two, can be employed according to the invention. 

Once a part.cu.ar recombinant DNA molecule is identified and isolated, several methods 
known .n the an may be used to propagate it. Once a suitable host system and growth 
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conditions are established, recombinant expression vectors can be propagated and prepared 
m quantity. As previously explained, the expression vectors which can be used include, but 
are not limited to, the following vectors or their derivatives: human or animal viruses such 
as vaccinia virus or adenovirus; insect viruses such as bacuiovirus; yeast vectors; 
5 bacteriophage vectors (e.g., lambda), and plasm id atid cosmid DNA vectors, to name but a 
few. 



In addition, a host cell strain may be chosen which modulates the expression of the inserted 
sequences, or modifies and processes the gene product in the specific fashion desired. 
Different host cells have characteristic and specific mechanisms for the iranslational and 
10 post-translational processing and modification (e.g., glycosylation, cleavage [e.g., of signal 
sequence]) of proteins. Appropriate cell lines or host systems can be chosen to ensure the 
desired modification and processing of the foreign protein expressed. For example, 
expression in a bacterial system can be used to produce an nonglycosylated core protein 
product- 

1 5 Vectors are introduced into the desired host cells by methods known in the art, e.g., 
transfection, eleciroporaiion, microinjection, transduction, cell fusion, DEAE dextran, 
calcium phosphate precipitation, lipofection (lysosome fusion), use of a gene gun. or a DNA 
vector transponer (see. e.g., Wu et aL, 1992. J. Biol. Chem. 267:963-967; Wu and Wu, 1988, 
J- Biol. Chem. 263:14621-14624: Hartmut et al., Canadian Patent Application No. 2,012,31 1, 

20 filed March 15, 1990). 



In a specific embodiment, an endonuclease III fusion protein can be expressed. An 
endonuclease III fusion protein comprises at least a functionally active portion of a non- 
endonuclease 111 protein joined via a peptide bond to at least a functionally active portion of 
an endonuclease III polypeptide. The non-endonuclease 111 sequences can be amino- or 

25 carboxy-terminal to the endonuclease III sequences. More preferably, for stable expression 
of a proteolytically inactive endonuclease III fusion protein, the portion of the non- 
endonuclease HI fusion protein is joined via a peptide bond to the amino terminus of the 
endonuclease III protein. A recombinant DNA molecule encoding such a fusion protein 
comprises a sequence encoding at least a functionally active portion of a non-endonuclease 

30 III protein joined in-frame to the endonuclease III coding sequence, and preferably encodes a 
cleavage site for a specific protease, e g , thrombin or Factor Xa, preferably at the 
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endonuciease Ill-non-endonuclease III juncture. In a specific embodiment, the fusion protein 
.s expressed m Eschench,a coh. In one preferred embodiment a glutathione S-transferase 
(GST)-endonuclease III fusion protein is prepared as described m Example 3. herein. 

General Prrffpjn p.-pi flcatinp Pr»r-y ^„p..; 

Initial steps for purifying the mammalian endonuciease III polypeptides of the present 
invention include salting in or salt.ng out, such as in ammonium sulfate fractionations; 
solvent exclusion fractionations, e.g., an ethanol precipitation; detergent extractions to free 
membrane bound protems using such detergents as Triton X- 100, Tween-20 etc.; or high salt 
extractions. Solubilization of proteins may also be achieved using aprotic solvents such as 
dimethyl sulfoxide and hexamethylphosphoramide. In addition, high speed 
ultracentrifugation may be used either alone or in conjunction with other extraction 
techniques. 



Generally good secondary isolation or purification steps include solid phase absorption using 
calcium phosphate gel or hydroxyapatite; or solid phase binding. Solid phase binding may 
be performed through ionic bonding, with either an anion exchanger, such as 
diethylaminoethyl (DEAE), or diethyl [2-hydroxypropyl] aminoethyl (QAE) Sephadex or 
cellulose; or with a cation exchanger such as carboxymethyl (CM) or sulfopropyl (SP) 
Sephadex or cellulose. Alternative means of solid phase binding includes the exploitation of 
hydrophobic interactions e.g., the using of a solid support such as phenylSepharose and a 
high salt buffer; affinity-binding, using, e.g., placing a substrate analog on an activated 
support; immuno-binding. using e.g., an antibody to the endonuciease III bound to an 
activated support; as well as other solid phase supports including those that contain specific 
dyes or lectins etc. A further solid phase support technique that is often used at the end of 
the purification procedure relies on size exclusion, such as Sephadex and Sepharose gels, or 
pressurized or centrif^igal membrane techniques, using size exclusion membrane filters. 

Solid phase support separations are generally performed batch-wise with low-speed 
centrifugations or by column chromatography. High performance liquid chromatography 
(HPLC). including such related techniques as FPLC. is presently the most common means of 
performmg liquid chromatography. Size exclusion techniques may also be accomplished 
with the aid of low speed centrifugation. 
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In addition size permeation techniques such as gel eiectrophoretic techniques may be 
employed. These techniques are generally performed in tubes, slabs or by capillary 
electrophoresis. 

Almost all steps involving protein purification employ a buffered solution. Unless otherwise 
5 specified, generally 25-100 mM concentrations are used. Low concentration buffers 
generally infer 5-25 mM concentrations. High concentration buffers gcnerallv infer 
concentrations of the buffering agent of between 0.1-2M concentrations. Typical buffers can 
be purchased from most biochemical catalogues and include the classical buffers such as 
Tris, pyrophosphate, monophosphate and diphosphate. The Good buffers [Good, N.E., et 
10 al.,(l966) Biochemistry, 5, 467; Good, N.E. and Izawa, S., (1972) Meth. Enzymol., 24, Part 
B, 53; and Fergunson, WJ. and Good, N. E., (1980) Anal. Biochem. 104, 300.] such as Mes, 
Hepes, Mops, tricine and Ches. Materials to perform all of these techniques are available 
from a variety of sources such as Sigma Chemical Company in St. Louis, Missouri. 

Specific purification procedures for the endonulease III polypeptide is exemplified in 
1 5 Example 1 , and for the corresponding fusion protein in Example 3. 

Antibodies to the F ndonuclease ITf 

According to the invention, endonuclease 111 produced recombinantly. from natural sources 
or by chemical synthesis, and fragments or other derivatives or analogs thereof, including 
fusion proteins, may be used as an immunogen to generate antibodies that recognize the 

20 endonuclease III polypeptide. Such antibodies include but are not limited to polyclonal, 

monoclonal, chimeric, single chain. Fab fragments, and an Fab expression library. The anti- 
endonuclease III antibodies of the invention may be cross reactive, e.g., they may recognize 
endonuclease III from different mammalian species. Polyclonal antibodies have greater 
likelihood of cross reactivity. Alternatively, an antibody of the invention may be specific for 

25 a single form of endonuclease III, such as rat endonuclease IIL Preferably, such an antibody 
is specific for human endonuclease III. 

Various procedures known in the art may be used for the production of polyclonal antibodies 
to endonuclease III or derivative or analog thereof For the production of antibody, various 
host animals can be immunized by injection with the endonuclease III, or a derivative (e.g.. 
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fragment or fus.on prote.n) thereof, including but no. limited to rabbits, m.ce. rats sheep 
goats, etc. In one embod.ment the endonucle^se III polypeptide or fragment thereof can be 
conjugated to an immunogenic carrier, e.,., bovine serum albumm (BSA) or kevhole Hmpet 
hemocvanm (KLH). Vanous adjuvants may be used to increase the immunoloeica. response 
^ dependmg on the host species, including but not limited to Freund's (complete and 

mcomplete). mineral gels such as alummum hydrox.de. surface active substances such as 
iysolecthm. pluron,c polyols, polyanions. peptides, oil emulsions, keyhole limpet 
hemocyanins. dinitrophenol. and potentially useful human adjuvants such as BCG itacille 
Calmette-Guerin) and Corynebacterium parvum. 

10 For preparafon of monoclonal anybodies directed toward the endonuclease ,1.. or fragment 
analog, or derivative thereof, any technique that prov.des for the production of antibody 
molecules by continuous cell lines .n culture may be used. These include but are not limited 
to ,he hybndoma technique originally developed by Kohler and Milstein {Nature 256 495- 
497 (1975)), as well as the trioma technique, the human B-cell hybndoma technique [Kozbor 
et al.. Immunology Today 4:72 1 983); Cote et al.. Proc. Natl. Acad. Sci. U.S.A. 80-2026-2030 
(1983)], and the EBV-hybridoma technique to produce human monoclonal an.ibod.es [Cole 
et al., m Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc.. pp. 77-96 (1985)) 
in an add.tional embodiment of the invention, monoclonal antibodies can be produced in 
germ-free an.mais utilizing recent technology [PCT/US90/02545). In fact, according to the 
■nvenfon. techn.ques developed for the production of "chimeric anybodies" [Morris;n et al 
J. Bacteriol. 159:870 (1 984); Neuberger et al., .V.,„.. 312:604-608 (1984); Takeda et al 
r^ature 314:452-454 ( ,985)) by splicing the genes from a mouse antibodv molecule specific 
for an endonuclease III together with genes from a human antibody molecule of appropriate 
biological activty can be used; such antibodies are within the scope of this invention. Such 
human or humanized chimeric antibodies are preferred for use in therapy of human diseases 
or d.sorders. since the human or humanized antibod.es are much less likely than xenogenic 
ant.bod.es to .nduce an .mmune response, in particular an allergic response, themselves. 

According to the invention, techniques described for the production of sinele chain 
ant.bod.es fU.S. Patent Nos. 5.476,786 and 5,132,405 to Huston; U.S. Patem 4.946.778) can 
be adapted to produce mammalian endonuclease III polypeptide-speciflc single chain 
ant,bod.es. An additional embodiment of the invention utilizes the techniques described for 
the cons,ruct.on of Fab expression libraries f Huse et al.. Science 246: 1275- 1 28 1 ( , 989)] to 
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allow rapid and easy identification of monocionai Fab fragments with the desired specificity 
for an endonuclease III polypeptide, or its derivatives, or analogs. 

Antibody fragments which contain the idiotype of the antibody molecule can be generated by 
known techniques. For example, such fragments include but are not limited to: the F(ab')- 
5 fragment which can be produced by pepsin digestion of the antibody molecule; the Fab' 

fragments which can be generated by reducing the disulfide bridges of the FCab'), fragment, 
and the Fab fragments which can be generated by treating the antibody molecule with papain 
and a reducing agent. 

In the production of antibodies, screening for the desired antibody can be accomplished by 
techniques known in the art, e.g., radioimmunoassay, ELISA (enzyme-linked immunosorbant 
assay), "sandwich" immunoassays, immunoradiometric assays, gel diffusion precipitin 
reactions, immunodiffusion assays, in situ immunoassays (using colloidal gold, enzyme or 
radioisotope labels, for example), western blots, precipitation reactions, agglutination assays 
(e.g., gel agglutination assays, hemagglutination assays), complement fixation assays, 
immunofluorescence assays, protein A assays, and Immunoelectrophoresis assays, etc. In 
one embodiment, antibody binding is detected by detecting a label on the primary antibody. 
In another embodiment, the primary antibody is detected by detecting binding of a secondary 
antibody or reagent to the primary antibody. In a further embodiment, the secondary 
antibody is labeled. Many means are known in the art for detecting binding in an 
immunoassay and are within the scope of the present invention. For example, to select 
antibodies which recognize a specific epitope of an endonuclease III polypeptide, one may 
assay generated hybridomas for a product which binds to an endonuclease III polypeptide 
fragment containing such epitope. For selection of an antibody specific to an endonuclease 
III polypeptide from a particular mammalian species of animal, one can select on the basis of 
positive binding with endonuclease III polypeptide expressed by or isolated from cells of that 
species of mammal. 

The foregoing antibodies can be used in methods known in the art relating to the localization 
and activity of the endonuclease III polypeptide, e.g., for Western blotting, imaging 
endonuclease III polypeptide in situ, measuring levels thereof in appropriate physiological 
30 samples, etc. using any of the detection techniques mentioned above or known in the art. In a 
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specific embodiment, antibodies that agonize or antagonize the activity of mammalian 
endonuclease III polypeptide can be generated. 

Amisense and Ribo7vme<; Asaintt F.nHonuclease III 

The present invention extends to the preparation of antisense nucleotides and ribozymes that 
may be used to interfere with the expression of mammalian endonuclease III at the 
translational level. Tliis approach utilizes antisense nucleic acid and ribozymes to block 
translation of a specific mRNA, either by masking that mRNA with an antisense nucleic acid 
or cleaving it with a ribozyme. Such methods can be used in preparing cells and/or 
organisms that lack functional endonuclease III. 



Antisense nucleic acids are DNA or RNA molecules that are complementary to at least a 
pomon of a specific mRNA molecule [see Marcus-Sekura. AnaJ. Biochem. 1 72:298 ( 1988)]. 
In the cell, they hybridize to that mRNA, forming a double stranded molecule The cell does 
not translate an mRNA in this double-stranded form. Therefore, antisense nucleic acids 
interfere with the expression of mRNA into protein. Oligomers of about fifteen nucleotides 
and molecules that hybridize to the AUG initiation codon will be particularly efficient, since 
they are easy to synthesize and are likely to pose fewer problems than larger molecules when 
introducing them into organ cells Antisense methods have been used to inhibit the 
expression of many genes in vitro [Marcus-Sekura. 1988, supra: Hambor et al.. J. Exp. Med 
168:1237 (1988)]. Preferably synthetic antisense nucleotides contain phosphoester 
20 ananalogs, such as phosphorothiolates, or thioesters, rather than natural phophoester bonds. 
Such phosphoester bond analogs are more resistant to degradation, increasing the stability, 
and therefore the efficacy, of the antisense nucleic acids. 

Ribozymes are RNA molecules possessing the ability to specifically cleave other single 
25 stranded RNA molecules in a manner somewhat analogous to DNA restriction 

cndonucleases. Ribozymes were discovered from the observation that certain mRNAs have 
the ability to excise their own introns. By modifying the nucleotide sequence of these RNAs, 
researchers have been able to engineer molecules that recognize specific nucleotide 
sequences in an RNA molecule and cleave it [Cech,y. Am. Med. Assoc. 260:3030 (1988)]. 
30 Because they are sequence-specific, only mRNAs with particular sequences are inactivated. 
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investigators have identified two types of ribozymes, Terrahymena-lypc and 
"hammerhead"-type. Tetrahymena-typQ rtbozynles recognize four-base sequences, while 
"hammerhead "-type recognize eleven- to eighteen-base sequences. The ionger the 
recognition sequence, the more likely it is to occur exclusively in the target MRNA species. 
Therefore, hammerhead-type ribozymes are preferable to Tetrahymena-xype ribozymes for 
inactivating a specific mRNA species, and eighteen base recognition sequences are 
preferable to shorter recognition sequences. 

The DNA sequences encoding the endonuclease HI described and enabled herein may thus be 
used to prepare antisense molecules (which block transcription of mRNA encoding 
endonuclease III) and ribozymes (that cleave mRNAs for the endonuclease III) thus 
inhibiting expression of the gene encoding the endonuclease III which thereby reduces the 
level endonuclease III in a target mammalian cell. 



Labels 



An endonuclease III of the present invention, including a full length, or naturally occurring 
form of endonuclease III, and any antigenic fragments thereof from any animal, particularly 
mammalian and more particularly a human source can be labeled. In addition, antibodies to 
the endonuclease III of the present invention, and nucleic acids that encode mammalian 
endonuclease III or fragments thereof, and probes that hybridize to such nucleic acids can 
also be labeled. 



20 Suitable labels include enzymes, iluorophores (e.g., fluorescene isothiocyanate (FITC), 
phycoerythrin (PE), Texas red (TR), rhodamine, free or chelated lanthanide series salts, 
especially Eu^*, to name a few fluorophores), chromophores, radioisotopes, chelating agents, 
dyes, colloidal gold, latex particles, ligands (e.g., biotin), and chemiluminescent agents. 
When a control marker is employed, the same or different labels may be used for the receptor 

25 and control marker. 



In the instance where a radioactive label, such as the isotopes ^H, '^C, ^^P, ^-S, ^*CI, ^'Cr, '"^Co, 
^•Co, ^^Fe, '^V, '^'I, and '•^Re are used, known currently available counting procedures 
may be utilized. In the instance where the label is an enzyme, detection may be 
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accomplished by any of the presently utilized colorimetric. spectrophotomctric, 
fluorospectrophotometric. amperometric or g&sometric techniques known in the art. 

Direct labels are one example of labels which can be used according to the present invention. 
A direct label has been defined as an entity, which in its natural state, is readily visible, either 
5 to the naked eye, or with the aid of an optical filter and/or applied stimulation, e.g. U.V. light 
to promote nuorescence. Among examples of colored labels, which can be used according to 
the present mvention. include metallic sol particles, for example, gold sol particles such as 
those described by Leuvering (U.S. Patent 4.3 13.734); dye sol particles such as described by 
Gribnau et al. (U.S. Patent 4,373,932) and May et al. (WO 88/08534); dyed latex such as 

1 0 described by May. supra, Snyder (EP-A 0 280 559 and 0 28 1 327); or dyes encapsulated in 
liposomes as described by Campbell et al. (U.S. Patent 4,703,017). Other direct labels 
include a radionucleotide. a fluorescent moiety or a luminescent moiety. In addition to these 
direct labelling devices, indirect labels comprising enzymes can also be used according to the 
present invention. Various types of enzyme linked immunoassays are well known in the an. 

1 5 for example, alkaline phosphatase and horseradish peroxidase, lysozyme, glucose-6- 
phosphate dehydrogenase, lactate dehydrogenase, urease, these and others have been 
discussed in detail by Eva Engvall in Enzyme Immunoassay ELISA and EMIT in Methods in 
Enzymology, 70. 419-439. 1980 and in U.S. Patent 4.857,453. 

Suitable enzymes include, but are not limited to. alkaline phosphatase and horseradish 
20 peroxidase. 



Other labels for use in the invention include magnetic beads or magnetic resonance imaging 
labels. 



In another embodiment, a phosphorylation site can be created on an antibody of the invention 
for labeling with '-P. e.g., as described in European Patent No. 0372707 (Application No. 
25 893 111 08.8) to Pestka. or U.S. Patent No. 5,459,240, issued October 1 7. 1 995 to Foxwell et 
al. 

As exemplified herein, proteins, including an endonuciease III of the present invention and 
antibodies thereto, can be labeled by metabolic labeling. Metabolic labeling occurs during in 
vitro incubation of the cells that express the protein in the presence of culture medium 



BNSDOCID; <WO _. 973161 2A2 I > 



wo 97/31612 



PCT/US97/03242 



42 



supplemented with a metabolic label, such as [''S]-methionine or [^=P]-orthphosphate. In 
addition to metabolic (or biosynthetic) iabeling'with (»S]-methionine, the invention further 
contemplates labeling with [-Cl-ammo acids and ('H)-amino acids (with the tritium 
substituted at non-labile positions). 



Administratinn 

According to the invention, the component or components of a therapeutic composition of 
the invention may be introduced parenterally, transmucosally, e.g., orally, nasally, or 
rectally, or transdermal ly. Preferably, administration is parenteral, e.g., via intravenous 
injection, and also including, but is not limited to, intra-arteriole. intramuscular, intradermal, 
subcutaneous, intraperitoneal, intraventricular, and intracranial administration. More 
preferably, where administration of endonuclease III is indicated to act to repair DNA 
damage and/or as a tumor suppressor of a tumor, it may be introduced by injection into the 
tumor or into tissues surrounding the tumor. 



15 



In another embodiment, the therapeutic compound can be delivered in a vesicle, in particular 
a liposome [see Langer, Science 249: 1 527- 1 533 (1 990); Treat et al., in Liposomes in the 
Therapy of Infectious Disease and Cancer, Lopez- Berestein and Fidler (eds.), Liss: New 
York, pp. 353-365 (1989); Lopez-Beresiein, ibid., pp. 317-327; see generally ibid.]. To 
reduce its systemic side effects, this may be a preferred method for introducing endonuclease 



III 



20 In yet another embodiment, the therapeutic compound can be delivered in a controlled 
release system. For example, the polypeptide may be administered using intravenous 
infusion, an implantable osmotic pump, a transdermal patch, liposomes, or other modes of 
administration. In one embodiment, a pump may be used [see Langer, supra; Sefton, CRC 
Crit. Ref Biomed Eng. 14:201 (1987); Buchwald et al., JwrgerK 88:507 (1980); Saudek et 

25 al., A^. Engl. J. Med 321:574 (1989)]. In another embodiment, polymeric materials can be 
used [see Medical Applications of Controlled Release, Langer and Wise (eds), CRC Press: 
Boca Raton. Florida (1974); Controlled Drug Bioavailability. Drug Product Design and 
Performance, Smolen and Ball (eds.), Wiley: New York (1984); Ranger and Peppas, J. 
MacromoL Sci. Rev. MacromoL Chem. 23:61 (1983); see also Levy et al.. Science 228:190 

30 (1985); During et al.. /in/i. Neurol 25:351 (1989); Howard et al., y. Neurosurg 71:105 
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(1989)], In yet another embodiment, a controlled release system can be placed in prox.mitv 
of the therapeutic target. / ... the brain.thus requiring only a fraction of the systemic dose 
[see. e.g, Goodson. m Med.caJ Appl, canons of Controlled Release, supra, vol. 2 pp 1 15- 
138 (1984)]. Preferably, a controlled release device is introduced into a subject in proximity 
of the site of mappropriate immune activation or a tumor. 

Other controlled release systems are discussed in the review by Langer [Science 249 1 527- 
1533 (1990)]. 



10 



In a further aspect, recombinam cells that have been transformed with the endonuclease III 
gene and that express high levels of the polypept.de can be transplanted in a subject in need 
of endonuclease .11 poiypept.de. Preferably autologous cells transformed with endonuclease 
I» are transplanted to avoid reject.on; aitemat.vely, technology is available to shield non- 
autologous cells that produce soluble factors within a polymer matrix that prevents immune 
recognition and rejection. 

Thus, the endonuclease III polypeptide can be delivered by intravenous, intraarterial 
.ntraperitoneal, intramuscular, or subcutaneous routes of administration. Alternatively the 
endonuclease III polypeptide, properly formulated, can be administered by nasal or oral 
administration. A constant supply of endonuclease 111 can be ensured by providing a 
therapeutically effective dose (,.e.. a dose effective to induce metabolic chan.es in a subject) 
at the necessary intervals, e.g., daily, every .2 hours, etc. These parameters will depend on 
the seventy of the disease condition being treated, other actions, such as diet modification 
that are .mplemented. the weight, age. and sex of the subject, and other criteria, which can be 
read.Iy determined according to standard good medical practice by those of skill in the art. 

A subject in whom administration of endonuclease III is an effective therapeutic reg.men for 
an dysprol.ferative disease is preferably a human, but can be any animal. Thus, as can be 
readily appreciated by one of ord.nary skill in the art. the methods and pharmaceutical 
compositions of the present invention are particularly suited to administration to anv animal 
particularly a mammal, and includ.ng. but by no means limited to. domestic animals, such a^ 
fehne or canine subjects, farm animals, such as but not limited to bovine, equine, caprine 
ovne, and porcine subjects, wild animals (whether in the wild or in a zoological garden) 
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research animals, such as mice, rats, rabbits, goats, sheep, pigs, dogs, cats, etc., avian species, 
such as chickens, turkeys, songbirds, etc., i.e., for veterinary medical use. 

Turning now to the specific aspects of the experiments that resulted in the discovery of the 
present invention, the purification of a mammalian endonuciease Ill-like enzyme from calf 
5 thymus was undertaken by monitoring its DNA glycosylase activity against UV induced 

pyrimidinc hydrates. The assay measures release of [^H]-labeled pyrimidinc hydrates and is 
reproducible and linear with respect to time and protein concentration. The substrate is 
easily prepared and, most importantly, the chemical identity of the enzymatically released 
phoioproducts can be corroborated by HPLC analysis. Calf thymus was chosen as the source 
1 0 of enzyme because it contains endonuciease Ill-like activity and because large amounts of 
very fresh tissue are available. 



A novel approach to the definitive identification of the mammalian enzyme was the 
application of a chemical reaction which results in the irreversible cross linking of the 
enzyme to its DNA substrate. N-acylimine (Schiff s base) enzyme substrate (ES) 

I 5 micrmediates are characteristic of the prokaiyotic DNA glycosylase/AP lyases described to 
date. Such intermediates can be irreversibly stabilized through chemical reduction to 
secondary amines. In such a way T4 endonuciease V (Dodson et aL. 1993). and the £. coli 
Fpg protein (Tchou and Grollman, 1995) were irreversibly cross-linked to substrate 
oligodeoxynucleotides containing a cyclobutane dimer, and an 8-oxoguanine residue, 

20 respectively. The reductive cross linking of enzyme to an oligodeoxynucieotide permits 
identification of the mammalian protein by two experimental parameters. The first is an 
mcrease in the apparent molecular mass of the enzyme as determined by SDS-PAGE. 
Second, if the oligodeoxynucieotide is 5'-end labeled with ^-P, the irreversibly cross linked 
protein-DNA complex can be detected by autoradiography or phosphorimaging after SDS- 

25 PAGE. 



On the basis of the results obtained with endonuciease V and the Fpg protein we anticipated 
successful irreversible cross linking of E, coli endonuciease III to an oligodeoxynucieotide 
containmg one of the enzyme's known substrates, thymine glycol. Assuming that the 
mammalian enzyme also functions through a iV-acylimine ES intermediate, we could then 
30 apply the reductive cross linking reaction to the purified mammalian enzyme fractions using 



BNSDOCID: <WO. 



9731612A2 I 



wo 97/31612 PCT/US97/03242 

45 

the same oligodeoxynucleotide. This would permit isolation of the correct protem species 
from a SDS-polyacr> lamide gel m sufficient ^ount for primary amino acid sequencmg. 

EXAMPI F I 

E.XDeriment;^! Prp/^ff^^^,rf5 
5 Buffers. Homogenizat.on Buffer: 25 mM HEPES. pH 7.5, 15 mM NaCl, 1 mM DTT, 2 mM 
EDTA. 0.5 mg/mL Leupeptin. 0.7 mg/mL Pepstatm. 0.2 mM phenylmethylsulfonyl nour.de. 
HDE: 25 mM HEPES. pH 7.5. 1 mM DTT, 2 mM EDTA. 

Enzyme. E. col. endonuclease HI was purified from £. col. strain UC6444 carrying the 
plasm.d pHITI as previously described [Asahara, eial.. Biochemistry. 28:4444^49 
10 (1989)]. 

Rad.onucleotides. [5S-'H] deoxycytidine-5--triphosphate (15-30 Ci/mmol) and (methyl- 
'H]-thymidine-5 -triphosphate (70-90 Ci/mmol) were purchased from Du Pont/NEN). 

OLgodeoxynucleotides. Alteniating poly(dG-dC) and poly(dA-dT) were purchased from 
Pharmacia. 

1 5 Purifcatio.. of a Pyr.m.dine Hydrate DNA-glycosylase from Calf Thymus All purification 
procedures were carried out a, 4 °C. unless otherwise ind.cated. 1 .2 kg of freshly obtained 
calf thymus was homogenized in a Waring Blendor in 4.8 L of homogenization buffer and 
ftiither fragmented by sonication in 300 mL aliqots for 3 min at 70% power using a Heat 
Systems model W-375 sonicator equipped with a model 305 high gain horn. 4 M NaCI was 
added to a final concentration of 320 mM and the gelatinous precipitate removed manually 
by spooling using a 10 mL glass pipene as a stirring rod. The remaining solution was cleared 
by centrifugation at 1 0,000 x g, filtered through cheesecloth and diluted with 1 .7 volumes of 
HDE to produce Fraction I (4000 ml). 

Fraction I was batch extracted with 450 ml (packed volume) of cation exchange resin (SP 
25 Fast-flow, Pharmacia) pre-equilibrated with HDE containing 1 50 mM NaCI. After the beads 
settled the supernatant was discarded and the beads poured into an XK 26/60 column 
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(Pharmacia). They were washed with 500 ml of HDE containing 150 mM NaCi, followed by 
a 2 L gradient from 1 50-700 mM NaCI at 4 ml /min. Twenty ml column fractions were 
collected and assayed. Fractions 45-75 were pooled to yield Fraction II (620 ml). 

Solid ammonium sulfate was added to Fraction (I. which contained approximately 350 mM 
5 NaCL to a final saturation of 21% (120 g/i solution). The sample was centrifuged at 12,000 x 
g for 20 min to remove precipitate and the supernatant applied to a C 26/40 column 
(Pharmacia), containing 150 mL (bed volume) of Octylsepharose 4 Fast-flow media 
(Pharmacia) pre-equilibraled with HDE, 21% ammonium sulfate, 300 mM NaCl. The 
column was washed with 150ml of HDE, 21% ammonium sulfate, 300 mM NaCI followed 
10 by a 1.5 L gradient beginning with HDE, 21% ammonium sulfate, 300 mM NaCI and 

finishing with HDE containing neither ammonium sulfate nor NaCI at 3 mL/min, collected in 
20 mL fractions. One mL aliquots of the column fractions were dialyzed into HDE, 125 mM 
NaCL and assayed for enzymatic activity. Active fractions (3 1-44) were pooled and dialyzed 
into HDE, 125 mM NaCI (Fraction III, 280mL). 

15 

Fraction III was concentrated by loading onto an HR 10/10 Mono S column (Pharmacia) and 
eluting via a step increase in NaC! concentration to HDE, 0.5 M NaCI. One mL fractions 
were collected and assayed and 12 active fractions were pooled. The 12 mL sample was 
divided into 3x 4 mL aliquots each of which were fractionated via gel filtration 

20 chromatography through a Hiload 26/60, Superdex 75 pg column (Pharmacia), run m HDE, 
350 mM NaCI (2.5 mL/min) and collected in 2.5 mL fractions. The gel filtration column was 
pre-calibrated with the Gel Filtration Low Molecular Weight Calibration Kit from 
Pharmacia. Active fractions (70-75, Mr = approximately 29kD) from each of 3 column runs 
were pooled to 45 mL which was diluted from 350 mM NaCI to 125 mM NaCI with 1.8 

25 volumes of HDE. The sample was then loaded onto a HR 5/5 MonoS column (Pharmacia) 
and concentrated via step elution with HDE, 0.5 M NaCL Enzymatic activity eluted in six 
0.5 mL fractions which were pooled to yield Fraction IV (3 mL). 

Fraction IV was diluted to 100 mM NaCI with 4 volumes of HDE, loaded onto a 1 mL single 
stranded DNA-ceilulose (ssDNA-cellulose, Sigma) HR 5/5 column (Pharmacia) and eluted 
30 with a 12.5 mL gradient ( 1 00-600 mM NaCI) (0.2 mL/min). Fractions 15-17 were pooled to 
yield Fraction V (1.5 mL). 
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Preparanon of Subsa-ates for DNA-Glycosylase Assays. Poly(dG-f ^H]dC) was produced as 
described previously (Boors.ein. ./., , 989. supra], by nick translation of Poly(dG-dC) 
(Phannaca) with [5.5--^H]dCTP (Du Pont/NEN). and pur.f.ed using Nick-Spin columns 
(Pham^aca) Poly(dG-[^H]dC) produced .n this manner had a specific act.v.rv of I "^xlO^ 
^ cpm/ug. This DNA was then exposed to 400 kJ/m^ of UV rad.ation at 254 nm (two . 5 Wa« 
gcrm.c.dal bulbs) to induce the formation of cytosme hydrate. UV flux was quantitated 
using a UVX 54 radiometer (UVP Inc., San Gabriel, CA). 

Poly(dA-[^H]dT was produced by the nick-translation of Poly (dA-dT) with [merHyl- 
HldTTP. followed by oxidation of the aitemat.ng copolymer with osmium tetroxide to form 
0 .hymme glycol residues [Higgins,^ a/.. I987,..p..]. The radiolabeled, ox.d.zed DNA was 
punned by passing it twice through Nick-Sp.n columns (Pharmac.a). Thymine glycol- 
containing poly(dA-[^H]dT) produced in this manner had a specific activity of 
approximately 7xI0^cpm/ug. 

DNA-Glycosyiase Assays. Pyrimidine hydrate and thymine glycol DNA-glvcosylase assays 
were carried ou, against UV-irrad.ated and oxidized DNA substrates respectively as follows- 
enzyme ahquots were incubated with 0. 1 ug of substrate DNA in a reaction mixture 
containing 15 mM HEPES. pH 7.5. 75 mM NaCl. 10 mM EDTA, and I mM DTT in a 
volume of 60 uL for specified periods of time up to 3 h at 37° C. Reactions were terminated 
by the addition of 25 uL of 25 mg/mL BSA and 2 mL acetone, which precp.tated both the 
protein and DNA. leaving in solution only the free modified bases which had been 
enzymatically cleaved from the DNA backbone. After centrifugation, at 8000 x g for 1 5 mm 
the supernatant was dried, resuspended in water and analyzed by liquid scimillation counting 
At each step the chemical identity of the released radioactive product was proven to be 
cytosme by HPLC. The free cytosine hydrate released by the enzyme is unstable, rapidly 
ehmmatmg water, and is recovered as free cytosine (Boorstein et al., 1989). One unit of 
enzyme released 1 pmole of cytosine hydrate from 0. 1 ug of UV-irradiated poly(dG-[^H]dC) 
■n I min. Enzyme assays lasted from 1 5 min to 3 h, depending upon the specific activity of 
the enzyme during the different phases of the purification. 

[^HjThymine glycol released from the oxidized poly(dA-(^H]dT) was identified by HPLC 
as previously described [Higgins, et al.. 1 987. supra]. 
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AF nicking assay. AP-site containing DNA was prepared and nicking activity assayed as 
described previously [Cunningham, ei al.. .1985, supra]. The assay is done in 10 inM EDTA 
to preclude any Mg - dependent AP endonuclease from acting on the substrate. 

Preparation of Thymine Glycol-contaimng Oligodeoxynucleotide for Cross Linking Studies. 
5 Thv-mine giycoUcontaining single stranded oligodeoxynucleotide was prepared as described 
previously (Kao, et aL. J. Bioi Chem,, 268:17787-17793 (1993)]. The oxidation was carried 
out on 50 OD260 of d{CGCGATACGCC) (SEQ ID NO:5). The complementary 1 i mer was 
synthesized by conventional means. 

Cross linking of Enzyme to Oligodeoxynucleotide. Twenty pmoles of the appropriate 
1 0 oligodeoxynucleotide. either thymine glycol-containing or complementary, was 5'-end 
labeled using T4 kinase (Gibco BRL) and p^P] y-ATP, according to the manufacturer's 
recommendations, and purified using a Nuc Trap Push Column (Stratagene) pre-equilibrated 
in 20 mM HEPES, pH 7.5, 50 mM NaCl, 5 mM EDTA. The radiolabeled 
oligodeoxynucleotide was then combined with 200 pmoles of non-radioactive 
1 5 oligodeoxynucleotide. and the complementary strand added at a 1:1 ratio and placed on ice 
for 30 min. Enzyme was reacted with the substrate double- stranded oligodeoxynucleotide in 
a total volume of 300 uL under the following reaction conditions: 37.3 mM NaCNBH3, 20 
mM HEPES, pH 7.5. 46.5 mM KCI, 5 mM EDTA. 1.5 uM oligodeoxynucleotide. 15 ng/uL 
protein. In the case of £. coli endonuclease III. this represented a 4 fold molar excess of 
20 substrate oligodeoxynucleotide to enzyme. After incubation at 37*^ C for 2 h, samples were 
quick frozen on dry ice. lyophilized, resuspended and boiled in 35 uL of IX SDS-PAGE 
loading buffer, and separated by electrophoresis on a 15% Tricine-SDS gel. Following 
electrophoresis, the gel was stained with Coomassie Blue, wrapped in plastic, and analyzed 
via phosphorimaging. 

25 Gel Electrophoresis. All samples were lyophilized to dryness, and resuspended in standard 
SDS loading buffer prior to electrophoresis. Fifteen percent Tricine gels were prepared 
[Shagger, et aL, Anai Biochem., 166:368-379 (1987)] and run using the Mini-Protein II 
electrophoresis system (Bio-Rad). Gels were run at 90 V for approximately 5 h, completion 
being determined by the progress of pre-staincd low molecular weight electrophoresis 

30 standards (Bio-Rad). Gels were then stained with Coomassie Blue. 
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Ammo acid sequence analysis. Fractions from the ssDNA cellulose column (Fraction V) 
were run on a 15% Tricine-SDS gel and stained with Coomassie Blue. The predominant 
band. Identical to the band which shified after reductive coupling to the thvTnme glycol- 
contam.ng oligodeox> nucleotide, was excised from the gel and sent to the W. M. Keck 
) Foundation microsequencing facility at Yale Universirv. New Haven, CT. 

At Yale, the prote.n was subjected to proteolytic digestion followed by purification on HPLC 
using a reverse phase microbore CI 8 column. Individual peaks were assayed for purity by 
laser desorption mass spectroscopy. After a I6h hydrolysis, amino acid analysis was carried 
out on a Beckman Model 6300 .on-exchange instrument [Rosenfeld. et ai. Anal Biochem.. 
203:173-179 (1992): Elliott, e, ai. Anal. Biochem.. 211:94-101 (1993); Williams and Stone, 
Techmques in Protein Chemistry VI. 143-152 (1995): Williams, et ai. Protein Protocol 
Handbook. (1995)]. 

The sequence homologies were obtained via the BLAST [Altschul. el ai. J. .Viol. Biol 
215:403-410 (1990)] Network Service of the National Center for Biotechnology Inforrnation 
1 5 which accesses the Brookhaven, Swiss, PIR and GenBank data bases. 
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RESIJITS 

Purification of the mammalian enzyme. A mammalian homologue of £. col. endonuclease 
III was purified from fresh calf thymus on the basis of its pyrimidine hydrate DNA- 
glycosylase activity . After the final purification step, ssDNA-cellulose chromatography, the 
enzyme was purified approximately 5,000-fold as estimated by the specific activity of the 
pyrimidine hydrate DNA-giycosylase and the yield was approximately 1%. This is set forth 
in Table K below, and in Figure I. 
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Table 1: Si 


jmmary of Pur 


ification of a Pyrimidine Hydrate DNA-Glycosylasc from Calf Thymus* 1 


fraction 


total 

pi UlCID 

(mg) 


volume 
(mL) 


* 

total 
activity 
(pmol/min) 


specific 
activity 


purification 


yield 
( /«) 


I 


52000 


4000 


2920 


0.056 






11 


2420 


720 


1020 


0-421 


7.5 


34.9 


III 


264 


350 


370 


1.40 


25 


12.7 


IV 


1.3 


3.0 


36 


27.7 


495 


1.2 


V 


0.1 


1.5 


29 


290 


5180 


1.0 


* Purification steps and fractions are described in the text. 





Co-elution of DNA-giycosyiase activities. Successive fractions from the ssDNA-cellulose 
column were assayed simultaneously for pyrimidine hydrate and thymine glycol DNA- 
glycosylase activities, both of which have been demonstrated for endonuclease \\\ [Higgins, 
et aL, 1987, supra: Boorstein, et aL 1989, supra]. Figure 2A documents the coelution of the 
two activities. 



Co-eiution of DNA-glycosylase and Mg 'Independent AP Site Nicking Activity, 
Comparable ssDNA-cellulose purified material from another calf thymus preparation was 
assayed simultaneously for Mg^-independent AP-nicking activity and pyrimidine hydrate 
DNA-glycosylase activity, both of which are also previously documented activities of £. coli 
endonuclease III [Cunningham, et al., 1985, supra]. The coelution of these two activities is 
shown in Figure 2B. 

Estimation of the Molecular Weight of the Mammalian Enzyme. The molecular radius of the 
mammalian DNA-glycosylase, as determined by gel filtration, was approximately 29 kD. 
Although ssDNA-cellulose fractions with peak enzymatic activity contained more than I 
protein species, a predominant band of apparent molecular mass of 3 1 kD was present on 
SDS-PAGE analysis. Moreover, when 25 uL aliquots of successive ssDNA-cellulose column 
fractions, were subjected to electrophoresis and stained with Coomassie Blue, the elution 
profile of this predominant 3 1 kD species, as judged by the intensity of staining, 
corresponded to that of the two DNA-glycosyiase activities (Figure 2C). 
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Reducnve Cross Unking ojthe Enzymes to a Thymine Glycol-Containing DNA 
Oligodeoxynucleonde. Incubation of purified £. coli endonuclease III with duplex DNA, 
(the thymine giycol-contain.ng oligodeoxynucleotide annealed to its complementary strand) 
in the presence of NaCNBHj resulted in an mcrease in the apparent molecular mass of the 
5 enzyme as determ.ned by SDS-PAGE (Figure 3A). Lane 2 demonstrates endonuclease III 
incubated with substrate DNA in the absence of NaCNBH3 and Lane 3 in the presetice of 
NaCNBH3. The increase in the apparent molecular mass of the endonuclease III is the 
result of irreversible cross linking of the enzyme to the oligodeoxynucleotide. 

The reductive cross linking reaction was also performed on the most purified preparation of 
10 the calfthymuspyrimidine hydrate DNA-glycosylase. A 75 uL aliquot of fraction Helmed 
from the ssDNA cellulose column (Figure 2A) containing purified enzyme of maximal 
specific activity, was mcubated with the thymine giycol-containing oligodeoxvnucleotide in 
the presence of NaCNBH3 along with appropriate controls. Lanes 4 and 5 represent ssDNA 
fraction 1 7 incubated with NaCNBHj and no substrate DNA and substrate DNA in the 
15 absence of NaCNBH3 respectively. The apparent molecular mass of 3 1 kD. as first shown in 
Figure 2C. did not change under either of these incubations. However, when the reaction 
mixture contained both substrate DNA and NaCNBHs. «he predominant 3 1 kD Coomassie 

Blue-stained band shifted to an apparent molecular mass of 35 kD, as shown in lanes 6 and 7. 

As an additional control. Fraction 8 eluting from the ssDNA cellulose column (Figure 2A). 
20 which contamed no enzymatic activity, was also exposed to the conditions of the coupling 

reaction. Lanes 8 and 9 of Figure 3 A contain protein from this fraction incubated with the 

oligodeoxynucleotide in the presence or absence of NaCNBH3. As can readily be seen, no 

shift of the visible protein bands occurred. 



25 



30 



Additional proof of irreversible cross linking is demonstrated in Figure 3B. The thvmine 
giycol-containing oligodeoxynucleotide had been S'-end- labeled with "P prior to the 
coupling reactions. A phosphorimage of the gel in Figure 3A demonstrated only 2 bands. 
Figure 3B shows a single band in lane 3 which corresponds to the position of the shifted 
cross linked endonuclease IIL The single band in lane 6 corresponds to the position of the 
predominant Coomassie- Blue stained species from calf thymus, which had also shifted after 
cross linking. Under the denaturing conditions (boiling) used to prepare samples for the SDS 
gel, the complementary oligodeoxynucleotide strand does not remain associated with the 
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protein. When the complementary, rather then the thymme glycol-contaming 
oligodeoxynucleotide was '-P-!abe!ed, the protcin'shifted (Figure 3A, lane 7), but did not 
appear on the phosphorimage (Figure 3B, lane 7). Thus, lanes 6 and 7 of Figure 3B prove 
that the bovine enzyme cross linked only to the oligodeoxynucleotide containing the thymine 
i glycol residue, thereby confirming that the irreversible cross linking resulting from chemical 
reduction is exclusively dependent upon formation of a specific ES intermediate. There was 
no evidence of any binding of oligodeoxynucleotide to the proteins which did not contain 
enzyme activity, further corroborating that the reductive cross linking reaction was 
absolutely specific. 



1 0 Amino Acid Sequence Data, Four peptides derived from a proteolytic digest of the purified 
bovine protein were sequenced yielding sequences of 14, 15, 22 and 23 amino acids. None 
of these sequences demonstrated direct similarity to £. coli endonuclease III by initial 
BLAST analysis. However, the 22 amino acid peptide sequence demonstrated considerable 
similarity to a portion of two predicted full length protein sequences from C eiegans (Acc, 

15 no. Z05874) [Wilson, e/ a/.. Nature, 368:32-38 (1994)] with P(N)=0.OOO53 and S. 

cerevesiae (Acc. no. LOS 146) with P(N)= 0.0063). Both the C eiegans and 5. cerevesiae 
proteins, in turn, bear similarity to E. coli endonuclease III (Acc. no. J02857). When 
compared with the sequence of endonuclease III via BLAST the C. eiegans and the S. 
cerevesiae sequence yielded P{N) values of 9.1x10 -- and l.9xl0"\ respectively. This same 

20 bovine polypeptide demonstrates an even greater degree of similarity to two recently 

submined partial 3' cDNA sequences, from H. sapiens (Acc. no. F04657) with PCN)= 6.8x10' 
^ and Raitus sp. (Acc. no. H33255) with P(N)= 1.8x10"^ 

Figure 4 demonstrates the alignment of the E, coli endonuclease III amino acid sequence, 
with the primary amino acid sequences of the bovine polypeptides, and the predicted amino 

25 acid sequences of the C. eiegans. H. sapiens, and Rattus sp. proteins derived by translation of 
their respective nucleotide sequences. The boldface sequence marked with the asterisk 
represents the 22 amino acid bovine polyf>eptide found to be most similar to the C eiegans 
and the H. sapiens and Rattus sp. sequences, as determined by the BLAST program using 
the default Blosum 62 as the algorithmic matrix and the default expected cutoff value of 10. 

30 The other peptides were aligned by the BLAST program after we raised the expected cutoff 
value from 10 to 100. 
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A mammalian pyrimidine hydrate DNA-glycosylase was purified 5.000 fold from calf 
thymus (summarized in Table I ). The most purified fractions, after elution from ssDNA 
cellulose, also demonstrated thymine glycol DNA-glycosylase and AP lyase activities. The 
AP nicking assay, performed in the presence of 1 0 mM EDTA, has previously been shown to 
specifically correspond to P-eiimination [Mazumder. e. ai. 1 99 1 . supra]. The elution 
profiles of the nvo gl> cosylase activities and the AP lyase activities were superimposable 
(Figure 2A.2B). The fact that these activities co-eluted in the most purified calf thymus 
fractions strongly suggests that they are contained within the same protein. 

When identical volumes of successive column fractions were analyzed by SDS-PAGE there 
was strong correspondence between the intensity of staining of a predominant 3 1 kO band in 
the active fractions and the enzyme activities, but several other protein species were also 
present which could have represented the bovine pyrimidine hydrate-thymine glycol DNA- 
glycosylase/AP lyase. 

1 5 Therefore, to definitively identify the bovine enzyme we took advantage of a reductive cross 
imkmg reaction which had already been applied to T4 endonuclease V and the E. coli Fpg 
protein. It was first demonstrated that, in the presence of NaCNBH3 P""fied E. col, 
endonuclease III would form a stable cross link to an oligodeo.xynucleotide containing 1 of 
Its substrates, thymine glycol. The apparent increase in molecular weight of the purified 

20 enzyme (Figure 3A. lane 3) together with the phosphorimaging data (Figure 3B lane 3) 
demonstrated unequivocally that the bacterial enzyme was irreversibly cross-linked ,o the 
substrate oligodeoxynucleotide. Thus, £. coii endonuclease III, was cross linked to a 
substrate DNA oligodeoxynucleotide in a manner analogous to T4 endonuclease V and the E 
coh Fpg protein, confirming that it also functions via an .V-acylimine ES intermediate. 

25 The same reaction was applied to the most purified bovine enzyme fraction and showed that 
only the predominam 3 1 kD protein spec.es was irreversibly cross linked to the same 
thymme glycol-containing oligodeoxynucleotide. The specificity of the reaction was 
confirmed by separately 5'-end labeling either the thymine glycol-containing or 
complementary strand of the substrate double-s.randed oligodeoxvnucleotide The increase 
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in the apparent molecular mass of the 31 kD protein occurred independently of which DNA 
strand was labeled: however, only when the thymine glycol-containing oligodeoxynucleotide 
was labeled, did a band corresponding to the shifted protein appear on phosphorimage of the 
gel (Figure 3B). That this strategy enabled the successful identification of the bovine analog 
of £. coli endonuclease III in a relatively complex mixture of mammalian proteins was 
contingent on the fact, unproven until now. that the mammalian enzyme and the bacterial 
enzyme both function through a A^-acylimine ES intermediate. 

The primary amino acid sequence data confirms that the purified 3 1 kD protein species we 
identified by reductive cross-linking is a mammalian homologue of endonuclease III. The 
aligned sequences of Figure 4 demonstrate the homology between the bovine and C. elegans 
proteins extending into the region which constitutes the iron-sulfur cluster of £. coli 
endonuclease III [Thayer, et ai. EMBOJ. 14:4108-4120 (1995)]. This iron-sulfur cluster 
motif contains 4 cysteine residues at endonuclease III positions 187. 194, 197, 203 and has 
been shown to be a DNA binding domain. The H. sapiens and Rattus sp. partial 3' cDNA 
1 5 sequences also contain four cysteine residues which align with those of £. coli endonuclease 
III and the C. elegans sequence. Thus, it seems probable that the E. coli, C. elegans and 
mammalian enzymes all share a common mode of DNA binding. A second bovine peptide, 
and the C. elegans predicted protein, both align with a region containing a known active site 
amino acid of endonuclease III. aspartic acid 138 (Figure 4, bold, italics). Another critical 
20 active site residue in £. coli endonuclease III is lysine 1 20 which probably contributes the e- 
amino group necessary for the formation of the yV-acylimine ES intermediate [Thayer, et ai, 
1995. supra]. Since we have demonstrated such an ES intermediate for the bovine enzyme, it 
is probable that all mammalian DNA glycosyiase/ AP lyases will prove to have a lysine 
residue as part of their active sites. In conclusion, given the similarities in amino acid 
25 sequence, including active sites and DNA binding domains among the E. coli endonuclease 
III. the purified bovine enzyme and the predicted sequences of the C. elegans. H. sapiens 
and Rattus sp. proteins, there may be a homologous family of endonuclease lll-like DNA 
repair enzymes present throughout phylogeny. 

EXAMPLE 2 



30 This example describes the development of the human cDNA corresponding to the bov 
EST the purification of which was described above, and is briefly reviewed below. 



me 
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The DNA repair enzyme was purified 5,000 fold from calf thymus. The enzyme purification 
was monitored by the assay described above„that measures DNA glycosyJase activity against 
UV irradiated poly(dG.[-'H]dC). After the last step of purification, the preparation was 
analyzed by SDS-PAGE and a predominant 3 IkD spec.es appeared in fractions containing 
enzyme activity. The identity of this spec.es was confirmed via reductive cross-linking of 
the enzyme protein to its DNA substrate, a reaction resulting from the reduction of the 
unstable enzyme-substrate imennediate to a stable secondary amine. 

The protein band was cut from the gel and subjected to sequencing analysis by standard 
.cchniques. The primary amino acid sequence of five polypeptides was determined. One of 
.he peptide sequences (LWSEINGLLVGFGOQTCLPIRP) (SEQ ID NO:28) was found to be 
homologous to the translated sequences of two 3' ESTs submitted to the gene bank in 
September 1995, a H. sapiens sequence (Accession No. F04657) and a sequence from Rauus 
sp (Accession No. H33255). 



15 



An oligonucleotide (dGTGGCACGAGATCAATGGACTCTTG) (SEQ ID NO:4) 
corresponding to a ponion of the human EST was used as a probe, to isolate a full length 
cDNA clone from a Superscript Human spleen cDNA library using the Gene Trapper cDNA 
positive selection system (GIBCO/Life Technologies). This system facilitates positive 
selection of clones from a cDNA library, using biotinylated sequence-specific 
20 oligonucleotides and magnetic streptavidin beads to enrich the library prior to screening 
Nucleotide sequencing yielded the full length cDNA set fonh in Figure 5, and in turn, 
revealed regions of homology to the primary amino acid sequences of the four other bovine 
polypeptides. An open reading frame encoding a protein of 292 amino acids was identified 
In Vitro translation of the full length cDNA sequence has resulted in the synthesis of a protein 
wuh an apparent molecular mass of 37kD on SDS-PAGE which has DNA glycosylase 
enzyme activity as measured using U V irradiated poly(dG-[3H]dC) as substrate. 

Disrns;^rnN| 

This enzyme recognizes pyrimidines which have undergone oxidation of the 5,6 double bond 
(such as thymine glycol) and pyrimidines which have undergone hydration as a result of 
exposure to UV. Such modified pyrimidines are both toxic and premutagenic and the 



25 
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phenotype of £. coli mutants lacking the homologous enzyme activity is that of a 
hypermutator. 

It is quite likely that the enzyme is an antimutator m mammalian cells and. therefore may be 
considered to ftinction as a tumor suppressor gene. In heterozygous individuals, loss of the 
5 remaining allele would result in the formation of a clone of cells lacking repair capacity. 
Such cells, exposed to oxidative or UV stress, could not repair the premutagenic modified 
pyrimidines and, accumulating mutations at an increased rate, could develop into a neoplasm. 
With appropriate nucleotide primers or probes derived from the sequence of the human gene 
product this hypothesis can be demonstrated. This could prove useful in prognosis for 
10 heterozygous individuals in whom preventive therapy could be instituted. 

EXAMP! Rl 

imrodyigucn 

DNA glycosylase/AP lyases function through N-acylimine (SchifFs base) ES intermediates 
[Dodson, et al., 1994, supra]. Such ES intermediates can be chemically reduced to stable 

1 5 secondary amines resulting in irreversible cross-linking of the enzymes to their particular 
substrates [Tchou, ei al.. J. Biol. Chem., 270: 1 167t-l 1677 (1995); Dodson, al.. 
Biochemistry. 32:8284-8290 (1993); Dodson, ei al., 1994. supra: Hilben. et al.. 
Biochemistry, 35:2505-251 1 (1996)]. As shown in the previous Examples, this cross-linking 
reaction was used to definitively identify a pyrimidine hydrate-thyminc glycol DNA 

20 glycosylase/AP lyase purified from calf thymus. Incubation of a ^^P labeled 

oligodeoxy nucleotide, performed under reducing conditions, containing a single thymine 
glycol (5,6 dihydroxy-5,6-dihydrothymine) residue with a 5000-fold purified enzyme 
preparation resulted in cross-linking of a predominant 3 1 kDa protein to the 
oligodeoxynucleotide as determined by SDS-PAGE analysis and phosphorimaging. Tryptic 

25 digestion of this protein, followed by microsequencing of several of the resulting peptides 

demonstrated that the bovine enzyme was homologous to theoretical proteins translated from 
the genomic DNA of 5. cerevisiae and C elegans. Both of these theoretical proteins in turn, 
were homoiogues of £. coli endonuclease III. The bovine peptide amino acid sequences 
were also homologous to the translated sequences of 3*ESTs from //. sapiens brain tissue 
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( Acc. no. F04657) and R.uus sp. PC 1 2 celJs (Acc. no, H33255). [Hilbert. e/ aL 1996. supra]. 

In the present Example, probes based upon the homologous human 3'EST uere used to 
isolate clones which encode the human homoiogue of E. coli endonuclease III from a splenic 
CDNA library. Once de,erTT,.ned. the cDNA sequence was used to express the enzyme as a 
functional recombinant protein, and to determine the chromosomal localization of the human 
gene. 
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Experimppt;^! P rocedures 
Radionucleondes. [a-^P] deoxycytidineo'-triphosphate (dCTP, 3000 Ci/mmol), (y--P] 
adenosine 5'-triphosphate (dATP, 3000 Ci/mmol), [methyl-3H]-thymidineo -triphosphate 
1 0 (TTP, 70-90 Ci/mmol) were obtained from Du Pont/NEN. 

Cloning of, he cDNA. Oligodeoxynucleotides based upon the human 3-EST sequence (Acc 
no. F04657) were used to isolate homologous clones from a Superscript human spleen cDNA 
library ,n the pCMV-SPORT plasmid vector (GibcoBRL) using the Genetrapper cDNA 
positive selection system (GibcoBRL), according to the manufacturer's protocol. Briefly the 
amplified double stranded cDNA library was made single stranded by treatment with the 
Gene II product (phage FI) endonuclease and E. col. exonuclease III, and then hybridized to 
a biotinylatcd sense strand specific oligodeoxynucieotide PI (5'- 
GTGGCACGAGATCAATGGACTCTTG) (SEQ ID NO:4). The cDNA- 
oligodeoxynucleotide hybrids were captured using streptavidin paramagnetic beads Non- 
specifically bound cDNA's were washed away at high stringency and specifically bound 
cDNA's were eluted from the paramagnetic beads by denaturing the cDNA- 
oligodeoxynucleotide hybrids. Selected cDNA clones were then made double stranded 
repair which was primed by a second sequence specific oligodeoxynucieotide P2 (5'- 
ATCA-ITGGACTCTGGGTGGGC) (SEQ ID NO:7). The selected repaired plasmids 
electropor^ted into the £. coli strain DH5a and plated onto Lennox L agar plates containing 
50 ug/mi ampicillin (LB/amp agar). 

After a 20 hour incubation at 3 70C colonies were analyzed for the presence of the desired 
CDNA insert via colony PGR, according to the manufacturer's protocol, usin. a second set of 
3'EST specific primers (P3. S'-CAACAGGCGTGGCTTCCTGAAGCG; and P4 5'- 
GGTGGGCTTCGGCCAGCAGACCTGT) (SEQ ID NOs: 8. 9) to maximize specificity of 
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the selection procedure. PGR was conducted as follows: 1 cycle of 95**C for 2 min; 37 cycles 
of 94^C for 1 mm. 60"C for 1 min. 72'*C for I min: followed by a fmal cycle of 10 min at 
72**C. PGR products were then analyzed by electrophoresis in a 1 .2% agarose geL Colonies 
which proved positive through the first PGR, by virtue of the production of a 180 bp product, 
5 were subjected to a second round of colony PGR in order to determine the size of the inserts 
using T7 and SP6 specific primers (5^-TAATAGGACTGAGTACTATAGGAGA, and 5*. 
AGGTATTTAGGTGAGAGTATAG, respectively) (SEQ IDNOs:10,ll). Of the 23 colonies 
obtained, 1 0 proved, through colony PGR and sequencing analysis, to contain the sequence 
of interest, 

1 0 Isolation of longer cDNA clones via a second GENETRAPPER selection. In order to isolate 
additional cDNA clones which contained long inserts and thus had a higher probability of 
containing the full-length cDNA sequence, the GENETRAPPER cDNA selection system was 
used a second time, substituting a second set of oligodeoxynucleotides for capture (P5; 5'- 
AGAGAGACTGGGTGTGGGGTATGAG), and repair (P6; 5'- 

1 5 AAGAGAGCGTGCAGCAGAAGG) (SEQ ID NOs: 12,13) of the selected clones. These 

primers were not based upon the human 3'EST sequence, but were specific for the 3' portion 
of previously sequenced cDNA inserts, and therefore were specific for the 5'-portion of the 
mRNA. Colonies were again screened and insen size determined by PGR as described above. 
However, rather than using the T7 primer, an additional sequence specific primer P7 (5*- 

20 GACCTTGCTGGAGAAAGG) (SEQ ID NO: 14). was used as a primer in PGR with the SP6 
primer to determine the size of the plasmid inserts. PGR-positive colonies which contained 
the largest inserts were sequenced. 

5'RACE analysis. Additionally, to confirm the sequence of the 5* terminus of the mRNA, the 
5' RAGE System (GibcoBRL, Life Technologies) was used to amplify the 5' terminus of the 

25 message for sequencing. The manufacturer's protocol for GG rich cDN A's was followed. 

Briefly, 2.5 pmoies of a gene specific primer P8 (5'-CATGAGTGACAGGAGCAGGT) (SEQ 
ID NO: 15) were hybridized to 1 00 ng human spleen poly A+ RNA (Glontech) and cDNA 
synthesized using Superscript II Reverse Transcriptase (GibcoBRL, Life Technologies). The 
RNA was then degraded with RNAse. and the cDNA isolated. A poly dC tail was then added 

30 to the 3*-terminus of the purified cDNA using dCTP and TdT, and the cDNA region 

corresponding to the 5' end of the mRNA was amplified by two successive rounds of PGR 
using additional gene specific primers P9 (5'-GATAGGGGAGAGGCAGTGTC, SEQ ID 
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NO: 1 6). and PI 0 (S'-CTTCTGCTGCAGCCTCTCTTC. SEQ ID NO: 1 7), together with the 
anchor primers supplied by the manufacturer^ 

The second round of PGR. yielded a single amphf.ed product which, when analyzed by 
electrophoresis on a 1 .2% agarose gel. corresponded in size to what was expected on the 
5 basis of the longest GENETRAPPER-isolated cDNA sequences. The PGR product was gel 
purified and cloned into the pGR II cloning vector (Invitrogen) using the TA cloning kit 
(Inviirogen). electroporated into the E. coli strain DH5a and plated onto LB/amp agar plates. 
Golonies were used to inoculate Lennox L broth cultures containing 50 ug/ml ampicillin 
(LB/amp broth) and the inserts of 10 isolated plasmids sequenced. 

1 0 DNA Sequencing. Plasmid DNA was purified for sequencing using the QIAprep Spin 

Plasmid Miniprep Kit (QIAGEN) from 5 ml of LB/amp broth cultures, containmg 50 ug/ml 
ampicillin incubated for 16 hours at 37-C. DNA sequencing was carried out by the NYU 
Kaplan Ganccr Center sequencing facility, usmg a Model 373 automated DNA sequencer 
(ABI), and Model 800 Lab Station (ABI). 

1 5 Construcuon of a GST-Fmion Protem in pGEX-lT. The DNA sequence encoding amino 
acids (8-304) of the open reading frame (FIGURE 8) were amplified via PGR from 50 ng of 
the purified cDNA containing plasmid via PGR using the following primers: PI 1 (5'- 
GTTGGATGCATGGTGAGCGGGAGCCGGAGG) (SEQ ID NO: 1 8), and P 1 2 (5'- 
CTCGAATTCGAGGCATGCGGGGGTCGGAGA) (SEQ ID NO: 19). These primers were 

20 designed to incorporate Bam HI and Eco Rl restriction sites into the 5' and 3' ends of the 
sense strand, respectively. PGR was conducted as follows: I cycle of 95°C for 2 min; 35 
cycles of 94°G for 1 min. 65»G for 1 min, and 72«G for 2 min; followed by a final cycle of 10 
min at 72»G. The resulting PGR product was digested with Bam HI. and Eco Rl, gel 
purified and then ligated into gel purified pGEX-2T vector (Pharmacia) which had previously 

25 been digested with Bam H 1 and Eco Rl, and electroporated into the E. coli strain NB42. 
Golonies were selected via growth on LB agar/amp plates, and the presence of the 
appropriate insert verified via colony PGR as described above, using primers P3 and P4. 
Expression of the full length fusion protein was confirmed via the induction of log phase 
(A590=0.6) 5 ml LB /amp broth cultures with 0. 1 mM IPTG for 4 hours at 37''C. To prepare 

30 total cell SDS lysates I m 1 aliquots of induced and uninduced cultures were centrifuged at 
5000 X g for 2 min. the supernatant was discarded, and the pelleted bacteria were 
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resuspended in !00 ul of SDS-PAGE loading buffer and heated at 95^C for 5 min. Thirty ul 
of each sample was then analyzed on a 1 5% Tricine geL After the gels were stained with 
Coomassie-blue, induced and uninduced samples were compared to demonstrate the 
expression of the full length (65 kDa) fusion protein. Bacterial lysates produced in an 
5 identical manner were also run on the SDS Page gel in FIGURE 10 in order to demonstrate 
induction of the GST-fusion protein. 

Protein Expression and Purification, 600 ml of LB/amp broth were inoculated with 10 ml of 
overnight cultures. Bacteria were grown at iVC until the A590 reached 0.6. Expression of 
the fusion protein was induced by incubation with 0.1 mM IPTG for 5 hours at 30^ (the 

1 0 lower temperature was used to increase the solubility of the fusion protein). Bacteria were 
then placed on ice for I hour, and pelleted by centrifugation at 3200 x g in 250 ml centrifiige 
tubes (Coming) for 10 min. The supernatant was discarded, and the pellet resuspended in 20 
ml of sonication buffer (50 mM Tris, pH 8.0, 500 mM NaCl, 5 mM EDTA, 0.5% Triton X- 
100, 0.25 mM PMSF. 0. 1 mg/ml Aprotinin). The bacteria were transferred to a 30 ml Corcx 

1 5 centrifuge tube, and sonicated for 2 min at 70% power using a Heat Systems model W-375 
sonicator equipped with a model 419 standard tapered microtip. The sonicate was then 
centrifuged for 15 min at 10,000 x g, and the supernatant transferred to a 50 mi plastic 
centrifuge tube containing 1.2 ml glutathione agarose 4B affinity media (volume of media 
was measured as a slurry in 20% ethanol, as supplied by the manufacturer) prewashed with 2 

20 x 40 ml of wash buffer (50 mM Tris, pH 8.0, 500 mM NaCl, 5 mM EDTA, 0.5% Triton X- 
100). The sample was incubated on ice with agitation for 30 min to allow adsorption of the 
fusion protein. The affinity media was then pelleted by centrifugation for 2 min at 950 x g. 
The supernatant was removed by pipetting, and the affinity media washed once with 20 ml of 
sonication buffer, and 4 times with 40 ml of wash buffer by thorough resuspension of the 

25 beads in the appropriate buffer followed by centrifugation at 950 x g for 1 min. After the 
final wash, the affinity media was resuspended in I ml of wash buffer, transferred to a 2 ml 
plastic tube, and centrifuged again at 950 x g for 1 min to pellet the beads. The supernatant 
was removed, and the beads were resuspended in I ml glutathione-agarose elution buffer 
(100 mM Tris, pH 8.0, 500 mM NaCl, 2.5 mM EDTA, 0.1% Triton X-100, 20 mM 

30 glutathione (Sigma)) and incubated for 1 2 hours on ice with agitation. Beads were then 

quickly pelleted by centrifugation at 950 x g and the supernatant which contained the eluted 
fusion protein transferred to a fresh tube. All purification procedures from sonication 
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through elution of ihe fusion protein were carried out at A'C. The purification yielded 9.9 mg 
of fusion protein. 



As a control the 26 kDa glutathione S-transferase (GST) of S. japomcum was expressed from 
the PGEX-2T vector (without a fusion msen) in the bacterial strain NB42 according to the 
same procedure described for the fusion protein. Twelve mg of purified GST was purified 
from 600 ml of induced bacterial culture. 



10 



15 



20 



Purification of E. col, endonuclease III. Endonuclcase Ml was purified from £. coli strain 
UC6444 carrying the piasmid pHITl as previously described (Asahara, e, ai. 1989, supra]. 

Spectrophotometry. Spectrophotometric measurements of proteins were made in elution 
buffer ( 1 00 mM Tris. pH 8.0. 500 mM NaCi, 2.5 mM EDTA. 0. 1 % Triton X- 1 00. 20 mM 
glutathione) in a quartz cuvette. The optical absorption spectra of the GST-fusion protein 
and the unfused GST (glutathione s-transferase) protein were recorded between 200 nm-700 
nm using a Spcctronic Genesystems 5 spectrophotometer (Milton Roy). In order to allow 
comparison of the absorption spectra of the purified GST-fusion protein, and purified GST 
(FIGURE 13 the purified proteins were diluted prior to analysis with glutathione-agarose 
elution buffer to the same absolute protein concentration (5.5 mg/ml). 



FISH analysis. FISH analysis was performed by SeeDNA Biotech Inc.. Dept. of Biology, 
York University, Ontario. Canada. Lymphocytes isolated from human blood were cultured 
in a -minimal essential medium (MEM) supplemented with 10% fetal calf serum and 
phytohemagglutinin (PHA) at 37»C for 68-72 hours. lh„ lymphocyte cultures were treated 
with BrdUrd (0.18 mg/ml Sigma) to synchronize the cell population. The synchronized cells 
were washed 3 times with serum-free medium to release the block and rccultured at 37»C for 
6 hours in a minimal essential medium with thymidine (2.5 mg/ml; Sigma). Cells were 
harvested and slides made by using standard procedures, including hypotonic treatment, 
25 fixing, and air-drying. 

To produce a probe for FISH analysis, a 1.1 kb fragment containing the entire cDNA 
sequence was excised from an isolated cDNA clone using Eco Rl and Hind HI. purified and 
labeled with biotin-14-dATP using the BioNick labeling kit (Gibco BRL) (Heng, et ai. Proc. 
Natl. Acad. Sc,. US A-, 89:9509-9513 (1992)]. The procedure for FISH analysis was 
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performed according to the previously reported procedures of Heng, et ai [Chromosomes, 
102:325-332 (1993): Methods in Molecular Biohgy: In Situ Hybridization Protocols, pp. 35- 
49 (1994)]. Briefly, slides were baked at 55oC for i h. After RNAse treatment, the slides 
were denatured in 70% formamide. 2 x SSC for 2 min at KfC followed by dehydration with 
ethanol. Probes were denatured at 75^C for 5 min in a hybridization solution containing 50% 
formamide, 10% dextran sulphate and human Cot /-restricted DNA. Probes were loaded on 
the denatured chromosomal slides. After overnight hybridization, slides were washed and 
analyzed. FISH signals and the DAPI banding pattern were recorded separately by taking 
photographs. Chromosomal localization was achieved by superimposing FISH signals with 
DAPI banded chromosomes [Heng, et ai^ 1994, supra]. 



Northern Blot Analysis, Two ug of mRNA, isolated from 293T cells using the FastTrack 2.0 
mRNA isolation system (Invitrogen), I ug of human spleen Poly A+ RNA (Clontech), and 5 
ug of 0.24-9.5 Kb RNA Ladder (GibcoBRL) were electrophorescd on a I! x 14 cm 1.0% 
agarose-formaldehydc gel. The gel was rinsed with deionized water and RNA transferred to 

1 5 a Nytran membrane (Schleicher & Schuei!) using the Turboblotter rapid downward transfer 
system (Schleicher & Schuell), according to the manufacturer's specifications. Following 
transfer the membrane was gently washed in 2 X SSC for 5 min, dried on a fresh sheet of 
filter paper, and baked at 80**C for 1 hour. The portion of the membrane which contained the 
molecular weight markers was cut away and stained by treatment with 5% acetic acid for 1 5 

20 min, 0.5 M sodium acetate. pH 5.2 with 0.04% methylene blue 10 min, followed by 

destaining with water. The baked fiher was incubated in prehybridization solution (in 50% 
formamide, 3 X SSC. 0. 1 M Tris, pH 7.4, 5 X Denhardt's solution) for 4 hours at 42"C, 
followed by hybridization overnight at 42 •'C with 2x10* cpm of radiolabeled probe/ml of 
hybridization solution (50% formamide, 3 X SSC, 0.1 M Tris pH 7.4, 5 X Denhardt's 

25 solution, 10% Dextran sulfate). Following hybridization, the membrane was washed 3 times 
for 30 mm at 50°C, successively with 1 x SSC, 0. 1% SDS; 0.5 x SSC, 0.1%SDS;0.I x SSC, 
0. 1% SDS. The membrane was exposed to X-ray film for 24 hours at -70''C. The 
autoradiogram was matched to the prestained markers to determine the size of the native 
mRNA. Before hybridization with the cDNA-specific probe, the Northern blot membrane 

30 was analyzed by hybridization to a P-actin specific probe to confirm the integrity of the 

mRNA. After hybridization to the p-actin probe detected an mRNA species of the predicted 
size (approximately 2.1 Kb) the membrane was stripped by boiling for 30 min in 0.1 x SSC, 
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0.5% SDS and probed according lo an identical procedure with the probe specific for the 
human homologuc of endonuclease HI (FIGIJRE 8). 

Preparation of Probes for Northern Blot Analysis. The P-actin probe was produced by PCR 
with sequence specific primers (Clontech) against cDNA made from the RNA of cells taken 
from a sample of a human bone marrow aspirate. PCR was conducted as follows: I cycle of 
95-C for 2 min; 35 cycles of 94»C for 1 min, eCC for 1 min. ITQ for 1 min: followed by a 
final cycle of 10 m.n at 72'C. The probe was then radiolabeled using the Random Primed 
DNA Labeling Kit (Boehringer-Mannheim). and (a.-P] dCTP. and purified using Nick-Spin 
columns (Pharmacia). The specific probe for the human homologue of endonuclease III 
which was prepared by excising the full length cDNA sequence shown in FIGURE 8 from 
the 2 ug of purified plasmid DNA via restriction with EcoK\ and Bam\\\ followed by gel 
purification of the restricted fragment. The probe was radiolabeled and hybridized to the 
Northern blot membrane, as described. 



15 



20 



25 



DNA Clycosylase Assay. Poly(dA-[3H]dT) was produced by nick-translation of the 
alternating copolymer poly (dA-dT) (Pharmacia) with [5'.5-3Hl TTP followed by oxidation 
with osmium tetroxide to form thymine glycol residues [Higgens, et ai. 1987, supra]. 
Thymine glycol-containing poly(dA-[3H]dT) produced m this manner had a specific activity 
ofapproximately 1 .4x107 dpm/ug. Thymine glycol DNA-glycosylase assays were carried 
out agamst oxidized DNA and the released radioactive product proven to be thymme glycol 
by HPLC analysis as previously described [Higgens. et ai. 1987, supra]. 

Sodium cyanoborohydride mediated cross-linking of fusion protein to a thymine glycol 
containing oligodeoxynucleotide. A double stranded oligodeoxynucleotide containing a 
single thymine glycol-residue was prepared as described previously [Hilbert. et ai. 1996. 
supra, Kao. et ai. 1 993, supra]. The thymine glycol containing strand was 5'-end labeled 
with [Y --P] dATP. using T4 kinase (Gibco BRL) according to the manufacturer's 
recommendations, and purified using a ChromaSpm- 1 0 column (Clonetech). 



The purified GST-fus.on protein, the non-fusion GST protem and £. coli endonuclease III 
were reacted with the substrate double-stranded oligodeoxynucleotide in a total volume of 50 
ul under the following reaction conditions: 37.3 mM NaCNBH,, 20 mM HEPES. pH 7.5. 
30 46.5 mM KCI. 5 mM EDTA, 4.0 uM of each oligodeoxynucleotide. 40 ng/ul protein. In the 
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case of colt endonuclease III, this represented approximately a 4-fold molar excess of 
substrate deoxyoligonucleotide to enzyme. After incubation at room temperature for 2 hours, 
25 ui volume of 3 x SDS PAGE loading buffer was added to each sample. Samples were then 
heated to 90»C for 5 min and separated by electrophoresis on a 15% Tricine-SDS gel. 
5 Following electrophoresis, the gel was stained with Coomassie Blue, wrapped in plastic, and 
analyzed via autoradiography. 

Gel Electrophoresis. Prior to electrophoresis all samples were incubated at 95**C for 5 min in 
standard SDS PAGE loading buffer. Fifteen percent Tricine gels fShagger. et ai. 1987, 
supra] were prepared and run using the Mini-Protein H electrophoresis system (Bio-Rad). 
1 0 Gels were run at 90 V for approximately 5 hours, completion being determined by the 

progress of pre-staincd low molecular weight electrophoresis standards (Bio-Rad). Gels were 
then stained with Coomassie Blue. 

The nucleotide sequence of a cDNA corresponding to the human homologue of £. coli 
1 5 endonuclease III is shown in FIGURE 8 The 1 045 bp cDNA contains a putative open 
readmg frame (ORF) of 912 bp, which encodes a protein of 304 amino acids having a 
calculated molecular mass of 33,569 and pi of 9.85. The nucleotide sequence data was 
obtained from two sources. The sequence of nucleotides 6 to 1045 was obtained by analysis 
of clones isolated from a cDNA library using probes based upon the sequence of the 
20 previously described human 3' EST. The sequence of nucleotides 1 to 5 was obtained by 

sequencing the products of 5'RACE using gene specific primers based upon the sequence of 
the longest cDNA clones. 

In the previous Examples, the sequence of four peptides obtained by proteolysis of a purified 
bovine pyrimidine hydrate-thymine glycol DNA glycosylase/AP lyase was disclosed 
25 [Hilbert, et ai, 1996, supra]. The sequences of those 4 peptides, as well as that of one 

additional peptide (GEGGEGAEHLQAP) (SEQ ID NO:24) derived from the same purified 
protein, also depicted in FIGURE 8, are shown aligned with the homologous sequences 
encoded within the ORF of the human cDNA. 

The 1045 bp sequence of FIGURE 8 likely represents most, if not all of the entire full length 
30 cDNA. The northern blot analysis (FIGURE 9) of human splenic and 293T cell (human) 
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mRNA each demonstrate a predominant mRNA species of approximately 1 . 1- 1 .2 Kb which 
hybridized to a '^P-labeled probe containing the entire sequence of the ORF. A diffeience of 
approximately 50 to 1 50 nucleotides in length between the cDNA sequence presented in 
FIGURE 8 and the native mRNA is explained by the expected presence of a polyA tail 
approximately the same length on the native species (perhaps a few more nucleotides) 5' to 
the first AUG codon. 

FIGURE 9. lane 3. which contains mRNA extracted from 293T cells, demonstrates a shows 
faim band of higher MW. Although it is postulated that this band is non-specific, the 
possibility that it represents mRNA encoding a protein similar to human endonuclease III 
cannot be excluded. This is the case found in S. cerevesiae, which contains two homologues 
of £. con endonuclease III, one believed to be nuclear, the other mitochondrial (see FIGURE 
15 and Discussion infra.). 

To demonstrate that the cDNA sequence of FIGURE 8 encodes a functional homolog of 
endonuclease III, a GST fusion protein was constructed consisting of amino acid residues 8 
to 304 of the ORF fused to the C-terminus of the 30 kDa GST protein. SDS-PAGE analysis 
of the IPTG induced, affinity-purified fiision protein (FIGURE 10) revealed a predominant 
65 kDa full length protein. Two additional lower molecular weight protein species were 
present in the purified preparation. It is believed that these proteins are fragments of the 65 
kDa protem arising from the abortive synthesis of the full-length protein, or from proteolysis 
20 occurring before, during, and after cell lysis and affinity purification due to the action of 
contaminating cellular proteases. 

As described previously, E. coli endonuclease III can be specifically, irreversibly cross- 
linked to a thymine glycol-containing oligodeoxynucleotide, via the reductive stabilization of 
its characteristic ES intermediate [Hilbert, et al. 1 996. suprd\. To further confirm that the 
ORF presented in FIGURE 8 encodes a fully functional homologue off. coli endonuclease 
ril the cross-linking reaction, as described in the Experimental Procedure section (above), 
was applied to the purified GST-fusion protein. The results of this reaction are illustrated in 
FIGURE 1 1. When aliquots of the purified GST-fusion protein incubated with «P-labeled 
thymine glycol-containing oligodeoxynucleotide. in the absence (Lane 6) or presence (Lane 
7) of sodium cyanoborohydride (NaCNHB,). were compared by SDS-PAGE analysis it was 
evident that a portion of the protein had been irreversibly cross-linked to the 



15 



25 



30 



BNSDOCID <WO 9731612A2 1 



wo 97/31612 



PCT/US97/03242 



66 

oligodeoxynucleotide. This is manifest by an increase in the apparent molecular weight of 
the enzyme resuhing in the formation of the doublet shown in Lane 7. The shift is analogous 
to that observed when endonuctease III was subjected to the same reductive cross-linking 
reaction (Lane 3), and compared with native endonuclease III (Lane 2). No shift of the major 
5 protein species was observed when the non-fusion GST protein (Lane 3) was incubated under 
reducing conditions with the thymine glycol-coniaining oligodeoxynucleotide (Lane 4). 

An autoradiogram of the gel in FIGURE 1 1 A is presented in FIGURE II B. As previously 
described, the thymine giycol-containtng oligodeoxynucleotide was 5*-end labeled with ^^P 
prior to incubation with the proteins. Thus, cross-linking was confirmed by this 

10 autoradiogram in which predominant radioactive species were present only in Lanes 2 {E. 
coli endonuclease III plus NaCNHBj) and 7 (GST fusion plus NaCNHBj) which correspond 
in apparent MW to the shifted species seen on the Coomassie blue stained gel. Also evident 
on the autoradiogram in Lane 7 are two visible, but less intense lower molecular weight 
bands which correspond in position to presumed degradation products of the fusion protein 

15 present even after affinity purification (FIGURE 10). Presumably these represent cross- 
linked, partially degraded fusion protein. 

After purification, the fusion protein was also analyzed for thymine glycol-DNA glycosylase 
activity, FIGURE 12 presents the V vs. [E,] plot in which thymine glycol release is 
expressed as a function of increasing content of fusion protein. The release of thymine 

20 glycol is linear with respect to fusion protein concentration over the amount of protein used. 
Based on the results of this plot, the specific enzymatic activity of the fusion protein was 
calculated to be about 1-2% of genetically engineered £. coli endonuclease III using the same 
assay. This reduced level of activity is apparently quite common among GST fusion 
proteins. GST protein, which contained no C-terminai fusion was induced and purified in a 

25 manner identical to the fusion protein and assayed for enzymatic activity. This non-ftision 
GST protein did not demonstrate detectable thymine glycol-DNA glycosylase activity at a 
protein concentration 3 orders of magnitude higher than that at which the fusion protein was 
assayed. 

As documented previously, E. coli endonuclease III, contains an iron-sulfur cluster in which 
30 a cubane [4Fe-4S] moiety is liganded by four cysteine residues. This domain produces a 

distinctive absorbency at 4 1 0 nm, [Thayer, et al., 1995, supra]. Conservation of this [4Fe- 
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4S) cluster in the human enzyme was inferred on the basis of the cDNA sequence of 
FIGURE 8. since the putative ORf contains the appropriate four cysteine residues at amir 
acid positions 282. 289, 292 and 300, and confirmed by taking an absorption spectrum of the 
purified GST-fusion protein which revealed that ,t too absorbed strongly at 4 1 0 nm (FIGURE 



lino 



13). 
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Purified E. coli endonucicase III has a characteristic absorption peak at 4 10 nm and as 
expected solutions containing approximately 0.5 mg/ml or greater of purified endonuciease 
HI are typically yellow-brown [Asahara, et oL. 1989. supra]. Similarly, a solution of the 
purified GST fusion protein at similiar concentrations of protein, was also yellow, while a 
solution of the simultaneously purified non-fusion GST protein was colorless. 

In order to determine the chromosomal localization of the gene encoding the mammalian 
enzyme FISH analysis was performed as described in Experimental Procedures (above). 
Under the conditions used, hybridization efficiency for our probe was approximately 70 % 
( ..e. among 1 00 mitotic spreads analyzed, 70 demonstrated binding of the probe to one pair 
of chromosomes). DAPl banding was used to idemify the chromosome pair to which the 
probe had bound (chromosome 16). The precise localization of the gene (16pl3.2) was 
determined by the summary analysis of 1 0 pairs of photographs in which the probe signal 
was matched with the results of DAPl banding (FIGURE 14). There was no additional locus 
detected by FISH analysis. These results taken together with the presence of a single mRNA 
species on Northern analysis indicates that the gene for human endonuciease III is a single 
copy gene. 

Discussion 

The human sequence of FIGURE 8 shows remarkable similarity to that of several other 
putative homologs of the E. coli endonuciease III (Acc No. J02847) found in representative 
species of all 3 biologic domains. In bacteria they have been found in both Gram negative 
{H. influenza, NCBI Seq. ID 1 169526) and Gram positive (B. subnUs, NCBI Seq. ID 729418) 
organisms: among archaea, in M. Jannaschii, (NCBI Seq. ID 1510694) and among 
eukaryotes, in S. pom be (NCBI Seq. ID 1065894). 5. cerevisiae (NCBI Seq. ID 1419843 and 
401436), C. elegans (NCBI Seq. ID 974795). Rama sp. (Acc. no. H33255) and H. sapiens 
(Acc. no. F04657). Tl,e 5. cerevisiae genome encodes two distinct theoretical homologues of 
£. coli endonuciease III. The alignment of the 9 putative homologous sequences using the 
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program Clustal W{ 1 .5) (FIGURE 15) reveals that a core sequence of amino acids is 
remarkably well conserved. In bacteria, the core sequence comprises virtually the entire 
protein. In contrast, the proteins of archeons and eukaryotes have unique extensions at their 
N-and/or C-termini. For the sake of clarity these extensions have been omined from 
5 FIGURE 15. 

Based upon similarities among several bacterial DNA glycosylases, site directed mutagenesis 
studies, and molecular modeling, several regions and residues u^ithin the core sequence of 
amino acids of £. coii endonuciease III could be involved in DNA binding and catalysis 
[Thayer, et ai, 1995, supra]. The region surrounding glutamine 41 (residue numbers refer to 

1 0 the E.coli endonuciease III amino acid sequence, unless otherwise indicated), may form a 
portion of the substrate binding pocket, in which the damaged pyrimidine fits when in the 
"flipped out" conformation which the enzyme recognizes. The Helix-hairpin-Helix (HhH) 
motif encoded by the residues surrounding the central LPGVG sequence (residues 114-118) 
(SEQ ID NO:3) is thought to function in non-specific DNA recognition. This analysis has 

1 5 been extended to show that similiar HhH motifs occur in 14 homologous families of DNA 
proteins binding, including DNA glycosylases, DNA polymerases, and "flap * endonucleases 
[Doherty, et aL Nucleic Acids Res.. 24:2488-2498 ( 1 996)]. Lysine 1 20 appears to be the 
nucteophile in the active site of endonuciease III which contributes the e-amino group 
necessary for the formation of the N-acylimine ES intermediate, characteristic of DNA 

20 glycosylase/AP lyases. Aspartic acid 138 has also been implicated as a functional active site 
residue. All of these residues appear to be well conserved in all of the 9 sequences shown. 
The structure of the £. coii endonuciease III was recently solved [Thayer, et al,, 1995, supra] 
and, in light of the high degree of conservation of critical residues, it is likely that the 
common core sequence of all members of the endonuciease III family will have a similar 

25 three-dimensional structure. 

In addition to the previously mentioned residues, 4 highly conserved cysteine residues (187, 
194, 197, 203) have been identified within this common core sequence which contribute to 
the [4Fe-4S] cluster of £. coIi endonuciease III. Examination of the aligned sequences in 
FIGURE 15, reveals that in £. co// endonuciease III and 5 of its 8 putative homologues, 
30 including the human enzyme, these 4 cysteines are arranged according to the consensus 

sequence Cys-X6-Cys-X2-Cys-X5-Cys (SEQ ID NO:21). A similar but slightly modified 
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sequence appears in 5. pombe (Cys-X6-Cys-X2-Cys-X7-Cys) (SEQ ID NO:22) and M 
jannaschu (Cys-X5-Cys-X2-Cys-X7-Cys) (SEQ ID NO:23). The basic amino acid residues 
between the first two cysteines of the [4Fe-4S) cluster may form a loop which functions in 
the nonspecific binding of DNA [Thayer, era/.. !995. supra]. While FIGURE 15 does not 
indicate absolute conservation of these residues, some conservation is apparent, especially 
with respect to arginine 193. 

As mentioned previously, the genome of 5". cerevisiae encodes two putative homologues of 
£. coli endonuclease III, one of which designated See non Fe-S in FIGURE 1 5 (NCBI Seq. 
ID 1419843) lacks the four cysteine [4Fe-4S] motif completely and presents an obvious 
exception to this consensus sequence. However, this sequence also encodes a puutive 
mitochondrial leader sequence [Ouellette, et ai. Genome. 36:32-42 (1993)]. Whether pairs of 
endonuclease Ill-like proteins, with and without [4Fe-4S] clusters, are present in other 
eukaryotic organisms and whether the non Fe-S proteins are mitochondrial remains to be 
determined. 
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This interesting question notwithstanding, the pn:sence of endonuclease Ill-like enzymes in 
representative species of all three evolutionary domains, suggests that the genomic DNA of 
organisms throughout phylogeny is subject to endogenous stresses which anack the 5,6 
double bonds of pyrimidine residues. Previously well characterized substrates of 
endonuclease III include oxidized pyrimidines such as thymine glycol and 5-hydroxycytosine 
and hydrates of cytosine and uracil. The oxidation of DNA bases has been primarily 
anributed to reactive oxygen species formed as byproducts of oxidative metabolism and 
inflammation. The formation of pyrimidine hydrates has been primarily attributed to the 
action of UV radiation [reviewed in Teebor, DNA Repair Mechanisms and Cancer, pp. 99- 
123 (1995)]. The archeon M.jannaschii lives beneath the sea and therefore is not exposed to 
direct sunlight. Furthennore, it is characterized by a reducing rather than an oxidizing 
metabolism [Bult, e, ai. Science. 273:1058-1073 (1996)]. The identification of a homologue 
of endonuclease III in the genome of this organism suggests that pyrimidines with reduced 
5,6 double bonds such as 5,6-dihydrothymine may be formed spontaneously in archeon 
genomic DNA. Perhaps within this evolutionary domain, it is primarily the formation of 
such reduced rather than oxidized or photohydrated pyrimidine residues which has promoted 
the conservation of an endonuclease lll-like enzyme. 
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At this time, the specific contribution which the human pyrimidine hydrate-thymine glycol 
DNA glycosylasc/AP lyase activity makes to the maintenance of the genome is uncertain. 
The human gene encoding this enzyme was localized to the locus 16pl3.2-.3 by FISH 
analysis (FIGURE 14). The accuracy of this localization was corroborated through the 
identification of genomic database nucleotide sequence (Acc. no. L48777) obtained by exon 
trapping from this same region of chromosome 1 6 [Bum. et al.. Gene (Amst.j. 161: 1 83- 1 87 
(1995)]). which is 94. 1% identical to nucleotides 699-799 of the sequence of FIGURE 8. 
The chromosomal locus of human endonuclease 111 homologue is in very close proximity to 
that of another DNA base excision repair enzyme. 3-methylpurine DNA glycosylase. as well 
as the DNA nucleotide excision gene, ERCC-4. There is no apparent homology among these 
3 proteins so it seems unlikely that their localization to the same chromosomal region is the 
result of gene duplication and divergence. Loss of heterozygosity in this region has been 
reponed to occur in 22 % of human hepatocellular carcinomas fSakai. et ai. J. 
Gastroenterol. Hepatol.. 7:288-292 (1992)]. Whether any or all of these DNA repair 
protcms act as tumor suppressors for human hepatocarcinogenesis remains to be determined. 

Various references are cited throughout this specification, each of which is incorporated 
herein by reference in its entirety. 

This invention may be embodied in other forms or carried out in other ways without 
depaning from the spirit or essential characteristics thereof The present disclosure is 
20 therefore to be considered as in all respects illustrative and not restrictive, the scope of the 
mvention being indicated by the appended Claims, and all changes which come within the 
meaning and range of equivalency are intended to be embraced therein. 



15 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(1) APPLICANT: Teebor, George W. 

Hilbert, Timothy P. 

(ii) TITLE OF INVENTION: MAMMALIAN ENDONUCLEASE III AND 
DIAGNOSTIC AND THERAPEUTIC USES THEREOF 

fiii) NUMBER OF SEQUENCES: 42 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: David A. Jackson, Esq 

(B) STREET: 411 Hackensack Ave, Continental Plaza, 4th 

Floor 

(C) CITY: Hackensack 

(D) STATE: New Jersey 

(E) COUNTRY: USA 

(F) ZIP: 07601 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS -DOS 

CD) SOFTWARE: PatentIn Release #1.0, Version #1.3 0 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US 

(B) FILING DATE: 26-FEB-1997 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Jackson Esq., David A. 

(B) REGISTRATION NUMBER: 26 742 

(C) REFERENCE/DOCKET NUMBER: 1049-1-001 N 

(ix) TELECOMMUNICATION INFORMATION- 

(A) TELEPHONE: 201-487-5800 

(B) TELEFAX: 201-343-1684 



C2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1045 base pai 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
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( i i i ) HYPOTHET I CAL : NO 

(iv) ANTI- SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 

AGTCCGGCAT GACCGCCTTG AGCGCGAGGA TGCTGACCCG GAGCCGGAGC CTGGGACCCG 6 0 

GGGCTGGGCC GCGGGGGTGT AGGGAGGAGC CCGGGCCTCT CCGGAGAAGA GAGGCTGCAG 120 

CAGAAGCGAG GAAAAGCCAC AGCCCCGTGA AGCGTCCGCG GAAAGCACAG AGACTGCGTG 180 

TGGCCTATGA GGGCTCGGAC AGTGAGAAAG GTGAGGGGGC TGAGCCCCTC AAGGTGCCAG 24 0 

TCTGGGAGCC CCAGGACTGG CAGCAACAGC TGGTCAACAT CCGTGCCATG AGGAACAAAA 3 00 

AGGATGCACC TGTGGACCAT CTGGGGACTG AGCACTGCTA TGACTCCAGT GCCCCCCCAA 36 0 

AGGTACGCAG GTACCAGGTG CTGCTGTCAC TGATGCTCTC CAGCCAAACC AAAGACCAGG 42 0 

TGACGGCGGG CGCCATGCAG CGACTGCGGG CGCGGGGCCT GACGGTGGAC AGCATCCTGC 48 0 

AGACAGATGA TGCCACGCTG GGCAAGCTCA TCTACCCCGT CGGTTTCTGG AGGAGCAAGG 54 0 

TGAAATACAT CAAGCAGACC AGCGCCATCC TGCAGCAGCA CTACGGTGGG GACATCCCAG 60 0 

CCTCTGTGGC CGAGCTGGTG GCGCTGCCGG GTGTTGGGCC CAAGATGGCA CACCTGGCTA 66 0 

TGGCTGTGGC CTGGGGCACT GTGTCAGGCA TTGCAGTGGA CACGCATGTG CACAGAATCG 72 0 

CCAACAGGCT GAGGTGGACC AAGAAGGCAA CCAAGTCCCC AGAGGAGACC CGCGCCGCCC 7 BO 

TGGAGGAGTG GCTGCCTAGG GAGCTGTGGC ACGAGATCAA TGGACTCTTG GTGGGCTTCG 84 0 

GCCAGCAGAC CTGTCTGCCT GTGCACCCTC GCTGCCACGC CTGCCTCAAC CAAGCCCTCT 90 0 

GCCCGGCCGC CCAGGGTCTC TGATGGCCGC ATGGCTCTGG CCGAGGTGCC GCTGTGGCCA 96 0 

CCGTCTGTGA AGTGGCTTTA CGCTTCAGGA AGCCACGCCT GTTGAATAAA GCTTTGGTGT 1020 

GTTTGCAAAA AAAAAAAAAA AAAAA 104 5 

(2) INFORMATION FOR SEQ ID NO ; 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 04 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(v) FRAGMENT TYPE: N-terminal 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

Met Thr Ala Leu Ser Ala Arg Met Leu Thr Arg Ser Arg Ser Leu Gly 
^5 10 15 

Pro Gly Ala Gly Pro Arg Gly Cys Arg Glu Glu Pro Gly Pro Leu Ar<? 
2° 25 30 

Arg Arg Glu Ala Ala Ala Glu Ala Arg Lys Ser His Ser Pro Val Lys 
35 40 45 



Arg Pro Arg Lys Ala Gin Arg Leu Arg Val Ala Tyr Glu Gly Ser Asp 
^° 55 60 

ser Glu Lys Gly Glu Gly Ala Glu Pro Leu Lys Val Pro Val Trp Glu 
" 75 80 

Pro Gin Asp Trp Gin Gin Gin Leu Val Asn He Arg Ala Met Arg Asn 



85 90 



95 



Lys Lys Asp Ala Pro Val Asp His Leu Gly Thr Glu His Cys Tyr Asp 
10<^ 105 lio 

Ser Ser Ala Pro Pro Lys Val Arg Arg Tyr Gin Val Leu Leu Ser Leu 
115 120 125 

Met Leu Ser Ser Gin Thr Lys Asp Gin Val Thr Ala Gly Ala Met Gin 
130 

Arg Leu Arg Ala Arg Gly Leu Thr Val Asp Ser He Leu Gin Thr Asp 

155 160 

Asp Ala Thr Leu Gly Lys Leu He Tyr Pro Val Gly Phe Trp Arg Ser 
165 170 175 

Lys val Lys Tyr He Lys Gin Thr Ser Ala He Leu Gin Gin His Tvr 
180 185 190 

Gly Gly Asp He Pro Ala Ser Val Ala Glu Leu Val Ala Leu Pro Gly 
195 200 205 

Val Gly Pro Lys Met Ala His Leu Ala Met Ala Val Ala Trp Gly Thr 

215 220 
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Val Ser Gly He 
225 

Leu Arg Trp Thr 



Ala Leu Glu Glu 
260 

Leu Leu Val Gly 
275 

Cys His Ala Cys 
290 



Ala Val Asp Thr 
230 

Lys Lys Ala Thr 
245 

Trp Leu Pro Arg 

Phe Gly Gin Gin 
280 

Leu Asn Gin Ala 
295 



74 

His Val His Arg 

235 

Lys Ser Pro Glu 
250 

Glu Leu Trp His 
265 

Thr Cys Leu Pro 



Leu Cys Pro Ala 
300 



He Ala Asn Arg 
240 

Glu Thr Arg Ala 
255 

Glu He Asn Gly 
270 

Val His Pro Arg 
285 

Ala Gin Gly Leu 



(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : sxngle 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: YES 
(v) FRAGMENT TYPE: 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

Leu Pro Gly Val Gly 

1 5 

(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Primer PI " 

( iii ) HYPOTHETICAL : NO 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 



BNSDOCID <WO _.9731612A2 l.> 



wo 97/31612 



PCT/US97/03242 



75 

GTGGCACGAG ATCAATGGAC TrTTG 

25 

(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Synthesized oiigo" 

(iii) HYPOTHETICAL: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 
CGCGATACGC C 

11 

(2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 amino acids 

(B) TYPE: ammo acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(V) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 

Trp Leu Pro Arg Xaa Leu Trp His Glu He Asn Gly Leu Leu Val Gly 
5 10 15 

Phe Gly cin Gin Thr Cys Leu Pro Val His Pro Arg Cys His Ala Cys 
20 25 

Leu Asn Gin Ala Leu Cys Pro Ala Ala Gin Gly Leu 



30 



^5 40 



(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

{A) DESCRIPTION: /desc = "Primer P2 " 

(iii) HYPOTHETICAL: NO 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 
ATCATTGGAC TCTGGGTGGG C 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Primer P3 " 

(iii) HYPOTHETICAL: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 
CAACAGGCGT GGCTTCCTGA AGCG 
(2> INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Primer P4 " 

(iii) HYPOTHET I CAL : NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 
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GGTGGGCTTC GGCCAGCAGA CCTGT 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: linear 

(XI) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Primer T7" 

(iii) HYPOTHETICAL: NO 



(XI) SEQUENCE DESCRIPTION: SEQ ID NO:10: 
TAATACGACT CACTACTATA GGAGA 

25 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Primer SP6 - 

(iii) HYPOTHETICAL: NO 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
AGCTATTTAG GTGACACTAT AG 

22 

(2) INFORMATION FOR SEQ ID NO:12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Primer P5" 

(iii) HYPOTHETICAL: NO 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
ACAGAGACTG CGTGTGGCCT ATGAG 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Primer P6 " 

(iii) HYPOTHETICAL: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 
AAGAGAGCCT GCAGCAGAAG C 
(2) INFORMATION FOR SEQ ID NO: 14: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Primer P7" 

( iii ) HYPOTHETICAL : NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 14 : 
CACCTTGCTC CAGAAACC 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other nucleic acid 
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(A) DESCRIPTION: /desc = "Pri 
(lii) HYPOTHETICAL: NO 



(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
CATCAGTGAC AGCAGCACCT 

20 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 
fC) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

t^a) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Primer P9" 

(111) HYPOTHETICAL: NO 



txa) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
CATAGGCCAC ACGCAGTCTC 
(2) INFORMATION FOR SEQ ID NO: 17: 

( 1 ) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Primer PiO' 

fiii) HYPOTHETICAL: NO 



20 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

CTTCTGCTGC AGCCTCTCTT C 

(2) INFORMATION FOR SEQ ID NO: 18: 

(1) SEQUENCE CHARACTERISTICS: 
fA) LENGTH: 3 0 base pairs 



21 
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(B) TYPE: nucleic acid 
CO STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: ocher nucleic acid 

(A) DESCRIPTION: /desc = "Primer Pii" 

(iii) HYPOTHETICAL: NO 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
CTTGGATCCA TGCTGACCCG GAGCCGGAGC 

30 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Primer P12" 

(iii) HYPOTHETICAL: NO 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
CTCGAATTCG AGCCATGCGG CCCTCCGAGA 
(2) INFORMATION FOR SEQ ID NO : 2 0 : 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 70 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(V) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Rattus rattus 
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(xa) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

His Arg He Ala Asn Arg Leu .ys Trp- Thr Xys Lys Met Thr Lys Ser 

10 ,5 

Pro Clu Glu Thr Arg Arg Asn Leu Clu Xaa Trp Leu Pro Arg Val Leu 

30 

T.P s„ al„ na 01. .eu val =1, Ph. oi, 

4 5 

..u V.1 Hi, P.O cy, =i„ «, e.s X.. „3 O.S 



60 



Pro Ala Ala Gin Gly Leu 
^5 70 

(2) INFORMATION FOR SEQ ID NO: 21: 

(1) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 17 amxno acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iil) HYPOTHETICAL: YES 
(v) FRAGMENT TYPE: 



(XX) SEQUENCE DESCRIPTION: SEQ ID NO:2l: 

x.a xa. X.. x„ X.. x.a Cys x„ X.. Cy, X.. X.. X.. x.a X.. 



10 ^5 



Cys 



INFORMATION FOR SEQ ID NO:22: 

(i) SEQUENCE CHARACTERISTICS - 

(A) LENGTH: 19 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
fD) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

fiii) HYPOTHETICAL: YES 

(V) FRAGMENT TYPE: 
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fxi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

Cys Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa Xaa Cys Xaa Xaa Xaa Xaa Xaa 
Xaa Xaa Cys 



(2) INFORMATION FOR SEQ ID NO:23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 amino acids 
<B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

ill) MOLECULE TYPE: peptide 
(111) HYPOTHETICAL: YES 

(v» FRAGMENT TYPE: internal 



SEQUENCE DESCRIPTION: SEQ ID NO:23: 

Xaa Xaa Xaa Xaa Xaa Cys Xaa Xaa Cys Xaa Xaa Xaa Xaa Xaa Xaa 
5 10 15 

Cys 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: bovine 
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Ux) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
Gly Glu Gly Gly Glu Glv Al;^ r^^^ u ■ t 

^ J- r u y^±y Ala Glu Hi^ Leu Gin Ala Pro 

10 

(2) INFORMATION FOR SEQ ID NO : 2 5 : 



(1) SEQUENCE CHARACTERISTICS* 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
CD) TO POLOG Y : linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(V) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: bovine 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 

Pro Val ASP Gin Leu Gly Ala Glu His Cys Phe Asp Pro Ser Ala 
^ 10 
(2) INFORMATION FOR SEQ ID NO: 26: 



15 



(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 3 amino acids 

(B) TYPE: ammo acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

( iii ) HYPOTHETICAL : NO 

(V) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: bovine 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

Leu Thr Val Asp Ser He Leu Gin Thr Asp Asp Ser Thr Leu Gly Ala 



10 ^5 



Leu He Val Pro Val Gly Phe 
20 
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(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

CA) LENGTH: 14 amino acids 

(B) TYPE: ammo acid 

(C) STRANDEDNESS : single 
{ D ) TOPOLOGY : 1 i near 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: bovine 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

Gin Gly Thr Val Asn Gly lie Ala Val Xaa Thr His Val Pro 

(2) INFORMATION FOR SEQ ID NO:28: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 2 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL : NO 



(V) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 

< A ) ORGANISM : bovine 



(XI) SEQUENCE DESCRIPTION: SEQ ID NO :2a: 

Leu Trp Ser Glu He Asn Gly Leu Leu Val Gly Phe Gly Gin Gin Thr 
^ 10 15 

Cys Leu Pro He Arg Pro 
20 

(2) INFORMATION FOR SEQ ID NO : 2 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 07 amino acids 
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(B) TYPE : amxno acid 

fC) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(i^) MOLECULE TYPE: protein 

(ixi) HYPOTHETICAL: NO 

(v) FRAGMENT TYPE: 

<vi) ORIGINAL SOURCE: 

(A) ORGANISM: Escherichia coli 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
Met Asn Lys Ala Lys Arg Leu Glu He Leu Thr Arg 



Leu Arg Glu Asn 



10 ^5 



Asn Pro His Pro Thr Thr 



Glu Leu Asn Phe Se 



20 ^^"^ Glu Leu 

30 



Leu He Ala Val Leu Leu Ser Ala Gin Al 



40 43 

Lys Ala Thr Ala Lys Leu Tvr Pro v^l ai^ a 

50 ^ Thr Pro Ala Ala Met 

60 

Leu Clu .eu Cly val aiu Ol, Val ^h. ryr Xla Lys Thr Xle Oly 



"^5 80 



T.. .sn se. .ys Ala OXu Asn Zle Xle Lys Th. Cys Ar, xie Leu 



90 35 



Leu Glu Gin His Asn Gly Glu Val Pr-^ r-T . 

100 ^"^5 Ala Leu Glu 

110 

Ala Leu Pro Gly Val Gly Arg Lys Thr Ala Asn Val Val Leu Asn Thr 

125 

Ala Phe Gly Trp Pro Thr lie Ala Val Asp Thr His He Phe Arg Val 

140 

cys Asn Arg Thr Gin Phe Ala Pro Gly Lys Asn Val Glu Gin Val Glu 

15 5 

<=lu .y. .eu . v,l v,l P.O »la al„ PH, .,s v.. c.. H ^ 

170 

His Trp Leu He Leu His Gly Ara Tvr- xh^ 

180 ^ ^ Arg Lys Pro 

190 

Arg cys Gly Ser Cys He He Glu Asp Leu Cys Glu Tyr Lys Glu 

200 205 
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(2) INFORMATION FOR SEQ ID NO : 3 0 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 07 amino acids 

(B) TYPE; ammo acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

Cv) FRAGMENT TYPE: N- terminal 

(vi) ORIGINAL SOURCE: 

<A> ORGANISM: Haemophilus influenz, 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
Met Asn Lys Thr Lys Ara Il6* ri,i r 

^ y Arg iie Glu He Leu Thr Arg Leu Arg Glu Gin 

10 15 

Asn Pro H.s Pro Thr Thr Glu Leu Gin Tyr Asn Ser Pro Phe Glu Leu 

25 30 

I^eu He Ala Val He Leu Ser Ala Gin Ala Thr Asp Lys Gly Val Asn 

40 45 

Lys Ala Thr Glu Lys Leu Phe Pro v^T zn ^ a 

50 ^ Asn Thr Pro Gin Ala He 

60 

^eu Asp Leu Gly Leu Asp Gly Leu Lys Ser Tyr He Lys Thr He Gl 

7 0 -J c 



75 



80 



Leu Phe Asn Ser Lys Ala Glu Asn He He Lys Thr Cys Arg Asp Leu 



90 95 



lie Glu Lys His Asn Gly Glu Val Pro Glu Asn Arg Glu Ala Leu 

105 



Glu 



Ala Leu Ala Gly Val Gly 



115 



Arg Lys Thr Ala Asn Val Val Leu Asn Thr 



120 



125 



Cys Asn Arg Thr Asn Phe 
145 



Ala Phe Gly Hxs Pro Thr He Ala Val Asp Thr His He Phe Arg Val 

Phe Ala Ala Gly Lys Asp Val Val Lys Val Glu 

155 

Glu Lys Leu Leu Lys Val Val Pro Asn Glu Phe Lys Val Asp Val His 
1" 170 

His Trp Leu He Leu His Gly Arg Tyr Thr Cys He Ala Arg Lys Pro 
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cy, ai, s.. c,s II. XI. Olu ..p.i.,„ e.s aiu Ty. .v, clu 
(2) INFORMATION FOR SEQ ID NO: 31: 



205 



<i) SEQUENCE CHARACTERISTICS- 

(A> LENGTH: 209 ammo acids 

(B) TYPE: amino acid 

(C) STRAiJDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : protein 

(iii) HYPOTHETICAL: NO 

(V) FRAGMENT TYPE: N- terminal 

(vi> ORIGINAL SOURCE: 

<A) ORGANISM: Bacillus subtilis 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

31„ Xle p,, 

Phe P,o ^, 2, 

30 



V.I V.I V.I .1. 3„ oi„ THr «p „.i 

V.I r.. 

60 

jyr ..u „,i 

... TV, ^„ 2 



He lie Glu Asp Tyr Gly Gly Glu Val a 

100 y GJ-U Val Pro Arg Asp Arg Asp Glu Leu 



"5 110 



val Lys Leu Pro Gly Val Gly Arg Lys Thr Ala Asn Val Val Val 



~ 
125 



v.. «a 01. V.1 P., « 

140 

Val ser Lys Arg Leu Gly He Cys Ara t^^ t 

14 5 150 ^ ^ ^""P -^^P ser Val Leu Glu 

155 

160 
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Val Glu Lys Thr Leu Met Arg Lys Val Pro Lys Glu Asp Trp Ser Val 

175 

Thr His Hxs Arg Leu He Phe Phe Gly Arg Tyr His Cvs Lys Ala Gin 

185 ' 190 

ser Pro Arg Cys Ala Glu Cys Pro Leu Leu Ser Leu Cys Arg Glu Glv 

200 205 ^ 

Gin 



(2) INFORMATION FOR SEQ ID NO: 32: 

Ci) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 04 ammo acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(V) FRAGMENT TYPE: N- terminal 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: M. jannaschii 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

Met Leu Asn Val He Leu Leu Lys Lys Leu Asn Lys Asn Ala Val Val 
^5 10 15 

Thr Glu He Ala Lys Asp Lys Asp Pro Phe Lys Val Leu He Ser Thr 
20 25 30 

He He Ser Ala Arg Thr Lys Asp Glu Val Thr Glu Glu Val Ser Lys 
35 40 45 

Lys Leu Phe Lys Glu He Lys Asp Val Asp Asp Leu Leu Asn He Asp 
50 55 so 

Glu Glu Lys Leu Ala Asp Leu He Tyr Pro Ala Gly Phe Tyr Lys Asn 

70 75 80 

Lys Ala Lys Asn Leu Lys Lys Leu Ala Lys He Leu Lys Glu Asn Tyr 
85 90 95 

Asn Gly Lys Val Pro Asp Ser Leu Glu Glu Leu Leu Lys Leu Pro Gly 

105 110 

Val Gly Arg Lys Thr Ala Asn Leu Val He Thr Leu Ala Phe Asn Lys 
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ASP xie cys Val .sp Th. His Val. His .r, Xle Cys Asn Ar, Trp 

-^^^ 140 
Olu lie val .3p Thr Clu Xhr P.o Olu CIu Thr .1. PHe OXu .eu Arg 

2 Q q 

-^^^ 160 



Lys Lys Leu P.o Lys Lys Ty. Trp X^ys VaX Xle Asn A.n I.eu X.eu Val 

170 

val P.e Cly A., Olu Xle Cys Ser Ser .ys Se. .ys Cys Asp .ys Cys 

190 

Phe Lys Glu Xle Lys Glu Lys Cys Pro Tyr Tyr Glu 

200 

(2) INFORMATION FOR SEQ ID NO:33: 

<x) SEQUENCE CHARACTERISTICS - 

(A) LENGTH: 231 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(V) FRAGMENT TYPE: N-termxnal 

(vi) ORIGINAL SOURCE: 

fA) ORGANISM: s. cerevisiae 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 

Arg Leu Met Arg Ser Lys Val Lys Thr Pro Val Asp Ala Met Gly Cys 

10 ^5 ^ 

30 

X>ys val ASP Pro .ys Asn P.e Arg X.eu Gl. Phe X.eu He Gly Thr Met 

40 45 

3„ M. Cl„ Th, .r, ^ oiu „„ 3,„ 

lie Thr Glu ^r Cys .eu Asn Thr Leu Lys Xle Ala Glu Gly Xle Thr 

75 

I^eu ASP Gly Leu Leu Lys Xle Asp Glu Pro Val Leu Ala Asn Leu xie 

90 95 
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Arg cys Val Ser Phe Tyr Thr Arg Lys Ala Asn Phe He Lys Arg Thr 



Ala Gin Leu Leu Val Asp Asn Phe Asp Ser Asp He Pro Tyr Asp He 

120 225 

Glu Gly He Leu Ser Leu Pro Gly Val Gly Pro Lys Met Gly Tyr Leu 

■^^^ 140 

Thr Leu Gin Lys Gly Trp Cly Leu He Ala Gly He Cys Val Asp Val 

^^"^ ^55 

HIS val His Arg Leu Cys Lys Met Trp Asn Trp Val Asp Pro He Lys 
"5 

cys Lys Thr Ala Glu His Thr Arg Lys Glu Leu Gin Val Trp Leu Pro 
^^'^ 

Has ser Leu Trp Tyr Glu He Asn Thr Val Leu Val Gly Phe Gly Gin 
^ 200 205 

Leu He cys Met Ala Arg Gly Lys Arg Cys Asp Leu Cys Leu Ala Asn 



220 



Asp Val Cys Asn Ala Arg Asn 
225 230 

<2) INFORMATION FOR SEQ ID NO: 34: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 230 amino acids 

(B) TYPE: amxno acid 

(C) STRANDEDNESS: single 
{ D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(V) FRAGMENT TYPE: N-terminal 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: S. cerevisiae 

(B) STRAIN: See nFe-S 



<Xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: 

Arg val Leu Arg Ser Lys He Leu Ala Pro Val Asp He He Gly Gly 



10 



15 



Ser Ser He Pro Val Th 



^ Ser Lys Cys Gly He Ser Lys Gli 

2 0 o c 

25 30 
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Gin He Ser Pro Arg Asp Tyr Arg Leu Gin Val Leu Leu Gly Val Met 

40, , 45 

Leu ser Ser Gin Thr Lys Asp Glu Val Thr Ala Met Ala Met Leu Asn 

60 

lie Met Arg Tyr Cys He Asp Glu Leu His Ser Glu Glu Gly Met Thr 

o= 70 Tc 

^5 80 

Leu Glu Ala Val Leu Gin He Asn Glu Thr Lys Leu Asp Glu Leu He 
8S 90 95 

Hxs ser val Gly Phe His Thr Arg Lys Ala Lys Tyr He Leu Ser Thr 
100 105 no 

cys Lys He Leu Gin Asp Gin Phe Ser Ser Asp Val Pro Ala Thr He 

120 

;^n Glu Leu Leu Gly Leu Pro Gly Val Gly Pro Lys Met Ala Tyr Leu 

140 

Thr Leu Gin Lys Ala Trp Gly Lys He Glu Gly He Cys Val Asp Val 

155 X60 
H.s val Asp Arg Leu Thr Lys Leu Trp Lys Trp Val Asp Ala Gin Lys 

cys Lys Thr Pro Asp Gin Thr Arg Thr Gin Leu Gin Asn Trp Leu Pro 
180 las 

Lys Gly Leu Trp Thr Glu He Asn Gly Leu Leu Val Gly Phe Gly Gin 
195 200 205 

He lie Thr Lys Ser Arg Asn Leu Gly Asp Met Leu Gin Phe Leu Pro 



220 



Pro Asp Asp Pro Gly Gly 

230 

(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 213 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

Ciii) HYPOTHETICAL: NO 

(V) FRAGMENT TYPE: N- terminal 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: S. pombe 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 5 : 

Cys Lys Met Lys Ala Lys Val Val Ala Pro Val Asp Val Gin Gly Cys 



10 



15 



His Thr Leu Gly Glu Arg Asn Asp Pro Lys Lys Phe Arg Phe Gin Thr 

30 



20 25 



"^^^ Lys Asp He Val Leu Glv 
35 40 45 



Leu Val Ala Leu Met Leu Ser Ser Gin 
35 40 

Pro Thr Met Arg Asn Leu Lys Glu Lys Leu Ala Gly Gly Leu Cys Leu 

55 60 

Glu Asp He Gin Asn He Asp Glu Val Ser Leu Asn Lys Leu He Glu 



65 

80 



70 75 



Lys Val Gly Phe His Asn Arg Lys Thr He Tyr Leu Lys Gin Met Ala 

95 90 95 

Arg He Leu Ser Glu Lys Phe Gin Gly Asp He Pro Asp Thr Val Glu 

105 



Asp Leu Met Thr Leu Pro Gly Val Gly Pro Lys Met Gly Tyr Leu Cys 
1^5 120 125 

Met Ser He Ala Trp Asn Lys Thr Val Gly He Gly Val Asp Val His 
130 135 

Val His Arg He Cys Asn Leu Leu His Trp Cys Asn Thr Lys Thr Glu 

150 -ICC 

155 160 



Glu Gin Thr Arg Ala Ala Leu Gin 



Ser Trp Leu Pro Lys Glu Leu Trp 
165 170 2^75 

Phe Glu Leu Asn His Thr Leu Val Gly Phe Gly Gin Thr He Cys Leu 
180 185 190 

Pro Arg Gly Arg Arg Cys Asp Met Cys Thr Leu Ser Ser Lys Gly Leu 
195 200 205 

Cys Pro Ser Ala Phe 
210 



(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 07 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 
CD) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
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(lii) HYPOTHETICAL: NO 

(V) FRAGMENT TYPE: N- terminal 

(Vi) ORIGINAL, SOURCE: 

(A) ORGANISM: Escherichia coli 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

Met Arg Lys Asp Met He Ala Pro Val Asp Thr Met Gly Cys His Lys 
^5 10 15 



Leu Ala Asp Pro Leu Ala Ala Pro 



Pro Val His Arg Phe Gin Val Leu 



2° 25 30 

val Ala Leu Met Leu Ser Ser Gin Thr Arg Asp Glu Val Asn Ala Ala 
35 40 45 

Ala Met Lys Arg Leu Lys Asp His Gly Leu Ser He Gly Lys He Leu 
^° 55 60 

Glu Phe Lys Val Pro Asp Leu Glu Thr He Leu Cys Pro Val Gly Phe 
" ^0 75 80 

Tyr Lys Arg Lys Ala Val Tyr Leu Gin Lys Thr Ala Lys He Leu Lys 
85 90 95 

Asp Asp Phe Ser Gly Asp He Pro Asp Ser Leu Asp Gly Leu Cys Ala 

105 

Leu Pro Gly Val Gly Pro Lys Met Ala Asn Leu Val Met Gin He Ala 
115 120 125 

Trp Gly Glu Cys Val Gly He Ala Val Asp Thr His Val His Arg He 
130 135 

ser Asn Arg Leu Gly Trp lie Lys Thr Ser Thr Pro Glu Lys Thr Gin 

155 160 

Lys Ala Leu Glu He Leu Leu Pro Lys Ser Glu Trp Gin Pro He Asn 
1^5 170 

His Leu Leu Val Gly Phe Gly Gin Met Gin Cys Gin Pro Val Arg Pro 
180 185 190 



Lys Cys Gly Thr Cys Leu Cys Arg Phe Thr Cys Pro Ser Ser Thr 
195 200 205 

(2) INFORMATION FOR SEQ ID NO: 37: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 211 amino acids 

(B) TYPE: amino acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 

(lii) HYPOTHETICAL: NO 

(V) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE : 

(A) ORGANISM: Homo sapiens 



(XI) SEQUENCE DESCRIPTION: SEQ ID NO r 3 7 : 

Arg Ala Met Arg Asn Lys Lys Asp Ala Pro Val Asp His Leu Gly Thr 



10 



15 



;iu His Cys Tyr Asp Ser Ser Al 



20 



Val Leu Leu Ser Leu Met Leu 

35 40 



a Pro Pro Lys Val Arg Arg Tyr Gin 



25 



30 



Ser Ser Gin Thr Lys Asp Gin Val Thr 



45 



Ala Gly Ala Met Gin Arg Leu Arg Ala Arg Gly Leu Thr Val 

55 60 



lie Leu Gin Thr Asp Asp Ala Thr Leu Gly Lys Leu He Tyr Pro Val 



70 75 



BO 



Gly Phe Trp Arg Ser Lys Val Lys Tyr He Lys Gin Thr Ser Ala He 
65 90 95 

Leu cm Gin His Tyr Gly Gly Asp He Pro Ala Ser Val Ala Glu Leu 
^00 105 

Val Ala Leu Pro Gly Val Gly Pro Lys Met Ala His Leu Ala Met Ala 

120 ^25 

val Ala Trp Gly Thr Val Ser Gly He Ala Val Asp Thr His Val His 

135 140 

Arg He Ala Asn Arg Leu Arg Trp Thr Lys Lys Ala Thr Lys Ser Pro 

155 160 



Glu Glu Thr Arg Ala Ala Leu Glu Glu 



1^5 170 



Trp Leu Pro Arg Glu Leu Trp 

175 



His Glu He Asn Gly Leu Leu Val Gly 



Phe Gly Gin Gin Thr Cys Leu 



180 185 190 

Pro val His Pro Arg Cys His Ala Cys Leu Asn Gin Ala Leu Cys Pro 



200 



205 
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Ala Ala Gin 
210 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 211 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(v) FRAGMENT TYPE: N- terminal 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Escherichia coli 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 8 : 

Met Asn Lys Ala Lys Arg Leu Glu He Leu Thr Arg Leu Arg Glu Asn 
^5 10 15 

Asn Pro His Pro Thr Thr Glu Leu Asn Phe Ser Ser Pro Phe Glu Leu 
2^ 25 30 



Leu He Ala Val Leu Leu Ser Ala Gin Ala Thr Asp Val Ser Val Asn 
35 40 45 

Lys Ala Thr Ala Lys Leu Tyr Pro Val Ala Asn Thr Pro Ala Ala Met 



50 55 



60 



Leu Glu Leu Gly Val Glu Gly Val Lys Thr Tyr He Lys Thr He Gly 
" 75 80 

Leu Tyr Asn Ser Lys Ala Glu Asn He He Lys Thr Cys Arg He Leu 
85 90 95 

Leu Glu Gin His Asn Gly Glu Val Pro Glu Asp Arg Ala Ala Leu Glu 
100 105 



Ala Leu Pro Gly Val Gly Arg Lys Thr Ala Asn Val Val Leu Asn Thr 

125 



120 



Ala Phe Gly Trp Pro Thr He Ala Val 



Asp Thr His He Phe Arg Val 



"0 135 
Cys Asn Arg Thr Gin Phe Ala Pro Gly Lys Asn Val Glu Gin Val 



145 



Glu 

155 160 



Glu Lys Leu Leu Lys Val Val Pro 



Ala Glu Phe Lys Val Asp Cys Hi« 
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175 



His Trp Leu He Leu His Gly Arg Tyr Thr Cys He Ala Arg Lys Pro 



180 



190 



Arg cys Gly Ser Cys He He Glu Asp Leu Cys Glu Tyr Lys Glu Lys 

200 205 



Val Asp He 
210 



(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 59 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

( lii ) HYPOTHETICAL: NO 

(v) FRAGMENT TYPE: N- terminal 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: C. elegans 



(xi) SEQUENCE DE, 

Met Arg Lys Asp 
1 

Leu Ala Asp Pro 
20 

Val Ala Leu Met 
35 

Ala Met Lys Arg 
50 

Glu Phe Lys Val 
65 

Tyr Lys Arg Lys 

Asp Asp Phe Ser 
100 

Leu Pro Gly Val 
115 



;CRIPTION: SEQ i: 

Met He Ala Pro 

5 

Leu Ala Ala Pro 



Leu Ser Ser Gin 
40 

Leu Lys Asp His 
55 

Pro Asp Leu Glu 
70 

Ala Val Tyr Leu 
85 

Gly Asp He Pro 



Gly Pro Lys Met 
120 



) NO: 39: 

Val Asp Thr Met 
10 

Pro Val His Arg 
25 

Thr Arg Asp Glu 



Gly Leu Ser He 
60 

Thr He Leu Cys 
75 

Gin Lys Thr Ala 
90 

Asp Ser Leu Asp 
105 

Ala Asn Leu Val 



Gly Cys His Lys 

15 

Phe Gin Val Leu 
30 

Val Asn Ala Ala 
45 

Gly Lys He Leu 



Pro Val Gly Phe 
80 

Lys He Leu Lys 
95 

Gly Leu Cys Ala 
110 

Met Gin He Ala 
125 
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Trp Gly Glu Cys Val Gly He Ala Val Asp Thr His Val His Arg II, 
ser Asn Arg Leu Gly Trp He Lys Thr Ser Thr Pro Glu Lys Thr Gin 



150 



155 



160 



Lys Ala Leu Glu He Leu Leu Pro Lys Ser Glu Trp Gin Pro He Asn 
"5 175 

Hxs Leu Leu Val Gly Phe Gly Gin Met Gin Cys Gin Pro Val Arg Pro 

185 190 

Lys cys Gly Thr Cys Leu Cys Arg Phe Thr Cys Pro Ser Ser Thr Ala 
155 200 205 

Lys Asn val Lys Ser Glu Thr Glu Glu Thr Ser Thr Ser He Glu Val 

220 

Lys Gin Glu val Glu Asp Glu Phe Glu Asp Glu Lys Pro Ala Lys Lys 

235 240 
He Lys Lys Thr Arg Lys Thr Arg Thr Lys He Glu Val 

Ser Glu Thr 



250 



Lys Thr Glu 
255 



(2) INFORMATION FOR SEQ ID NO:40: 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1046 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE; 

(A) ORGANISM: Homo sapiens 



fxi) SEQUENCE DESCRIPTION: SEQ ID NO:40: 
CCCACGCGTC CCGCCGCCTT GAGCGCGAGG ATGCTGACCC GGAGCCGGAG CCTGGGACCC 
GGGGCTGCGC CGCGGGGGTG TAGGGAGGAG CCCGGGCCTC TCCGGAGAAG AGAGGCTGCA 
GCAGAAGCGA GGAAAAGCCA CAGCCCCGTG AAGCGTCCGC GGAAAGCACA GAGACTGCGT 
GTGGCCTATG AGGGCTCGGA CAGTGAGAAA GGTGAGGGGG CTGAGCCCCT CAAGGTGCCA 



60 
120 
180 
240 
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360 
420 
480 
540 
600 
660 
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GTCTGGGAGC CCCAGGACTG GCAGCAACAG CTGGTCAACA TCCGTGCCAT GAGGAACAAA 
AAGGATGCAC CTGTGGACCA TCTGGGGACT GAGCACTGCT ATGACTCCAG TGCCCCCCCA 
AAGGTACGCA GGTACCAGGT GCTGCTGTCA CTGATGCTCT CCAGCCAAAC CAAAGACCAG 
GTGACGGCGG GCGCCATGCA GCGACTGCGG GCGCGGGGCC TGACGGTGGA CAGCATCCTG 
CAGACAGATG ATGCCACGCT GGGCAAGCTC ATCTACCCCG TCGGTTTCTG GAGGAGCAAG 
GTGAAATACA TCAAGCAGAC CAGCGCCATC CTGCAGCAGC ACTACGGTGG GGACATCCCA 
GCCTCTGTGG CCGAGCTGGT GGCGCTGCCG GGTGTTGGGC CCAAGATGGC ACACCTGGCT 
ATGGCTGTGG CCTGGGGCAC TGTGTCAGGC ATTGCAGTGG ACACGCATGT GCACAGAATC 720 
GCCAACAGGC TGAGGTGGAC CAAGAAGGCA ACCAAGTCCC CAGAGGAGAC CCGCGCCGCC 78 0 

CTGGAGGAGT GGCTGCCTAG GGAGCTGTGG CACGAGATCA ATGGACTCTT GGTGGGCTTC 840 
GGCCAGCAGA CCTGTCTGCC TGTGCACCCT CGCTGCCACG CCTGCCTCAA CCAAGCCCTC 900 
TGCCCGGCCG CCCAGGGTCT CTGATGGCCG CATGGCTCTG GCCGAGGTGC CGCTGTGGCC 960 
ACCGTCTGTG AAGTGGCTTT ACGCTTCAGG AAGCCACGCC TGTTGAATAA AGCTTTGGTG 102 0 

TGTTTGCAAA AAAAAAAAAA AAAAAA 
(2) INFORMATION FOR SEQ ID NO: 41: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 8 94 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

( D ) TOPOLOGY : 1 i near 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi> ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



1046 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 1 : 

ATGCTGACCC GGAGCCGGAG CCTGGGACCC GGGGCTGGGC CGCGGGGGTG TAGGGAGGAG 

CCCGGGCCTC TCCGGAGAAG AGAGGCTGCA GCAGAAGCGA GGAAAAGCCA CAGCCCCGTG 

AAGCGTCCGC GGAAAGCACA GAGACTGCGT GTGGCCTATG AGGGCTCGGA CAGTGAGAAA 
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GGTGAGGGGG CTGAGCCCCT CAAGGTGCCA GTCTGGGAGC CCCAGGACTG GCAGCAACAG 24 0 

CTGGTCAACA TCCGTGCCAT GAGGAACAAA AAGGATGCAC CTGTGGACCA TCTGGGGACT 3 00 

GAGCACTGCT ATGACTCCAG TGCCCCCCCA AAGGTACGCA GGTACCAGGT GCTGCTGTCA 360 

CTGATGCTCT CCAGCCAAAC CAAAGACCAG GTGACGGCGG GCGCCATGCA GCGACTGCGG 4 20 

GCGCGGGGCC TGACGGTGGA CAGCATCCTG CAGACAGATG ATGCCACGCT GGGCAAGCTC 480 

ATCTACCCCG TCGGTTTCTG GAGGAGCAAG GTGAAATACA TCAAGCAGAC CAGCGCCATC 540 

CTGCAGCAGC ACTACGGTGG GGACATCCCA GCCTCTGTGG CCGAGCTGGT GGCGCTGCCG 600 

GGTGTTGGGC CCAAGATGGC ACACCTGGCT ATGGCTGTGG CCTGGGGCAC TGTGTCAGGC 660 

ATTGCAGTGG ACACGCATGT GCACAGAATC GCCAACAGGC TGAGGTGGAC CAAGAAGGCA 720 

ACCAAGTCCC CAGAGGAGAC CCGCGCCGCC CTGGAGGAGT GGCTGCCTAG GGAGCTGTGG 780 

CACGAGATCA ATGGACTCTT GGTGGGCTTC GGCCAGCAGA CCTGTCTGCC TGTGCACCCT 840 

CGCTGCCACG CCTGCCTCAA CCAAGCCCTC TGCCCGGCCG CCCAGGGTCT CTGA 8 94 
(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 97 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(V) FRAGMENT TYPE: C- terminal 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: 

Met Leu Thr Arg Ser Arg Ser Leu Gly Pro Gly Ala Gly Pro Arg Gly 

Cys Arg Glu Glu Pro Gly Pro Leu Arg Arg Arg Glu Ala Ala Ala Glu 
20 25 30 

Ala Arg Lys Ser His Ser Pro Val Lys Arg Pro Arg Lys Ala Gin Arg 
35 40 45 ^ 

Leu Arg Val Ala Tyr Glu Gly Ser Asp Ser Glu Lys Gly Glu Gly Ala 
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50 55 60 

Glu Pro Leu Lys Val Pro Val TrpGlu Pro Gin Asp Trp Gin Gin Gin 
" 75 80 

Leu Val Asn He Arg Ala Met Arg Asn Lys Lys Asp Ala Pro Val Asp 
85 90 95 

His Leu Gly Thr Glu His Cys Tyr Asp Ser Ser Ala Pro Pro Lys Val 

105 110 

Arg Arg Tyr Gin Val Leu Leu Ser Leu Met Leu Ser Ser Gin Thr Lys 
115 120 125 

Asp Gin Val Thr Ala Gly Ala Met Gin Arg Leu Arg Ala Arg Gly Leu 
130 135 

Thr Val Asp Ser lie Leu Gin Thr Asp Asp Ala Thr Leu Gly Lys Leu 

150 155 160 

He Tyr Pro Val Gly Phe Trp Arg Ser Lys Val Lys Tyr He Lys Gin 
165 170 175 

Thr Ser Ala He Leu Gin Gin His Tyr Gly Gly Asp He Pro Ala Ser 
180 185 190 

Val Ala Glu Leu Val Ala Leu Pro Gly Val Gly Pro Lys Met Ala His 
1^5 200 205 

Leu Ala Met Ala Val Ala Trp Gly Thr Val Ser Gly He Ala Val Asp 
210 215 220 

Thr His Val Hxs Arg He Ala Asn Arg Leu Arg Trp Thr Lys Lvs Ala 

230 235 ' 240 

Thr Lys Ser Pro Glu Glu Thr Arg Ala Ala Leu Glu Glu Trp Leu Pro 
245 250 255 

Arg Glu Leu Trp His Glu He Asn Gly Leu Leu Val Gly Phe Gly Gin 
260 265 270 

Gin Thr Cys Leu Pro Val His Pro Arg Cys His Ala Cys Leu Asn Gin 
275 280 285 

Ala Leu Cys Pro Ala Ala Gin Gly Leu 
290 295 
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WHATi<;ri AiMFn [g- 

II A mammalian endonuciease III purified greater than about 500-fold, which 

2 endonuciease III demonstrates pyrimidine hydrate DNA-glycosylase activity, thymine glycol 

3 DNA-glycosylase activity, and lyase activity, and reductively cross links with a thymine 

4 glycol contaming oligodeoxynucleotide. 

1 2. The endonuciease III of Claim 1 . wherein the endonuciease is purified greater than 

2 about 5000- fold. 

1 3. The endonuciease III of Claim 1. wherein the endonuciease in 100 mM NaCI elutes 

2 from a 1 ml single stranded-DNA-cellulose chromatography column eluted with a 12.5 ml 

3 gradient of 100 to 600 mM NaCI at 0.2 ml/min in about fractions 12-18. 

1 4. The endonuciease III of Claim 2 which elutes in about fractions 15-17. 

1 5. The endonuciease III of Claim I which has a molecular radius of 29 kDa as 

2 determined by gel filtration. 

1 6. The endonuciease III of Claim 1 which has a predominant molecular weight of 3 1 

2 kDa as determined by SDS-PAGE analysis. 

1 7. The endonuciease HI of Claim 1 which has a partial amino acid sequence selected 

2 from the group consisting of SEQ ID NO:25. SEQ ID NO:26. SEQ ID NO:27. SEQ ID 

3 NO:28. SEQ ID NO:6. and SEQ ID NO;20. 

1 8. The endonuciease III of Claim 1 which has an amino acid sequence selected from the 

2 group consisting of bovine endonuciease III, human endonuciease III, and rat endonuciease 

3 HI. 

1 9. An endonuciease III having an amino acid sequence corresponding to SEQ ID NO:2. 

1 10. A purified nucleic acid encoding a mammalian endonuciease III, which endonuciease 

2 III demonstrates pyrimidine hydrate DNA-glycosylase activity, thymine glycol DNA- 
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3 glycosyiase activity, and lyase activity, and reductively cross links with a thymine glycol 

4 containing oligodeo.xynucleotide. 

1 II. The purified nucleic acid of Claim 1 0, wherein the endonuclcase in 100 mM NaCI 

2 elutes from a 1 ml single stranded-DNA-cellulose chromatography column eluted with a 12.5 



ml gradient of 100 to 600 mM NaCl at 0.2 ml/min in about fractions 12-18. 



1 12. The purified nucleic acid of Claim 10. wherein the endonuclease III elutes in about 

2 fractions 15-17. 



1 13. 

2 radius of 29 kDa as determined by gel filtration. 



The purified nucleic acid of Claim 10, wherein the endonuclease III has a molecular 



1 14. The purified nucleic acid of Claim 10, wherein the endonuclcase III has a 

2 predominant molecular weight of 3 1 kDa as determined by SDS-PAGE analysis. 

1 1 5. The purified nucleic acid of Claim 10, wherein the endonuclease III has a partial 

2 amino acid sequence selected from the group consisting of SEQ ID NO:25. SEQ ID NO:26, 

3 SEOIDNO:27,SEQlDNO:28,SEOIDNO:6,andSEOIDNO:28. 



1 1 6. The purified nucleic acid of Claim 10, wherein the endonuclease III has 

2 acid sequence selected from the group consisting of bovine endonuclease III. hum 

3 endonuclease III. and rat endonuclease HI. 



an ammo 
an 



1 17. The purified nucleic acid of Claim 1 0 which has a nucleotide sequence 

2 corresponding or complementary to the nucleotide sequence selected from the group 



3 consisting of SEQ ID NO:25, SEQ ID NO:26. SEQ ID NO:27, SEQ ID NO:28. SEQ ID 

4 NO:6, and SEQ ID NO:28. 

1 1 8. The purified nucleic acid of Claim 1 0 which is DNA encoding the mammalian 

2 endonuclease HI. 



1 19 



A recombinant DNA vector comprising the DNA of Claim 18. 
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1 42. The method according to Claim 40, wherein the expression vector is a naked DNA 

2 expression vector 



1 43. The method according to Claim 40. wherein the expression vector is introduced into 

2 tissue exposed to radiation prior to exposure to the radiation. 

1 44. The method according to Claim 40, wherein the expression vector .s mtroduced into 

2 tissue exposed to radiation after exposure to the radiation. 

1 45. A method for treating a disease or disorder associated with DNA damage in a 

2 mammal, comprising increasing the level of mammalian endonuclease III in cells 

3 demonstrating DNA damage, wherem the endonuclease III demonstrates pyr.midine hydrate 

4 DNA-glycosylase activity, thymine glycol DNA-glycosylase activity, and lyase activity, and 

5 reductively cross links with a thymine glycol containing oligodeoxynucieotide. 

1 46. The method according to Claim 45, wherein the level of mammalian endonuclease 

2 III is increased by administration of purified endonuclease III to the cells demonstrating 

3 DNA damage. 



1 47. 

7 



1 

2 
1 



The method according to Claim 45, wherein the level of mammalian endonuclease 



III is increased by administration of a recombinant expression vector to the cells 
demonstrating DNA damage, which expression vector provides for expression of the 
4 mammalian endonuclease III in vivo. 



1 48. The method according to Claim 47, wherein the expression vector is a viral 

2 expression vector. 



49. The method according to Claim 47. wherein the expression vector is a naked DNA 

expression vector. 



50. The method according to Claim 47, wherein the expression vector .s introduced into 
2 tissue exposed to radiation prior to exposure to the radiation. 
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1 62. An antibody that specifically b.nds to the mammalian endonuclease HI of Claim I . 
1 63. The antibody of Claim 62 which is polyclonal. 
1 64. The antibody of Claim 62 which is monoclonal. 

1 65. An antibody that specifically binds to the human endonuclease III of Claim 9. 
1 66. The antibody of Claim 65 which is polyclonal. 
J 67. The antibody of Claim 65 which is monoclonal. 
2 68. A fusion protein comprising a mammalian endonuclease III. 

2 elnucra'r ^ — ^ 

1 70. The fusion protein of Claim 69. wherein the human endonuclease II, has an amino 

^ acid sequence of SEQ ID NO:42. 

1 7 , . The fusion protein of Claim 70, where the human endonuclease III is fused to 

^ glutathione S-transferase. 
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Figure 2 
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Figure 3 
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Human Endonuclease III cDNA Sequence, 
Length: 1046bp 

S^SS^^"^ CCGCCGCCTT GAGCGCGAGG ATGCTGACCC GGAGCCGGAG 
CCTGGGACCC GGGGCTGGGC CGCGGGGGTG TAGGGAGGAG CCCGGGCCTC 
TCCGGAGAAG AGAGGCTGCA GCAGAAGCGA GGAAAAGCCA CAGCCCCGTG 
AAGCGTCCGC GGAAAGCACA GAGACTGCGT GTGGCCTATG AGGGCTCGGA 
CAGTGAGAAA GGTGAGGGGG CTGAGCCCCT CAAGGTGCCA GTCTGGGAGC 
CCCAGGACTG GCAGCAACAG CTGGTCAACA TCCGTGCCAT GAGGAACAAA 
AAGGATGCAC CTGTGGACCA TCTGGGGACT GAGCACTGCT ATGACTCCAG 
TGCCCCCCCA AAGGTACGCA GGTACCAGGT GCTGCTGTCA CTGATGCTCT 
CCAGCCAAAC CAAAGACCAG GTGACGGCGG GCGCCATGCA GCGACTGCGG 
GCGCGGGGCC TGACGGTGGA CAGCATCCTG CAGACAGATG ATGCCACGCT 
GGGCAAGCTC ATCTACCCCG TCGGTTTCTG GAGGAGCAAG GTGAAATACA 
TCAAGCAGAC CAGCGCCATC CTGCAGCAGC ACTACGGTGG GGACATCCCA 
GCCTCTGTGG CCGAGCTGGT GGCGCTGCCG GGTGTTGGGC CCAAGATGGC 
ACACCTGGCT ATGGCTGTGG CCTGGGGCAC TGTGTCAGGC ATTGCAGTGG 
ACACGCATGT GCACAGAATC GCCAACAGGC TGAGGTGGAC CAAGAAGGCA 
ACCAAGTCCC CAGAGGAGAC CCGCGCCGCC CTGGAGGAGT GGCTGCCTAG 
GGAGCTGTGG CACGAGATCA ATGGACTCTT GGTGGGCTTC GGCCAGCAGA 
CCTGTCTGCC TGTGCACCCT CGCTGCCACG CCTGCCTCAA CCAAGCCCTC 
TGCCCGGCCG CCCAGGGTCT CTGATGGCCG CATGGCTCTG GCCGAGGTGC 
CGCTGTGGCC ACCGTCTGTG AAGTGGCTTT ACGCTTCAGG AAGCCACGCC 
TGTTGAATAA AGCTTTGGTG TGTTTGCAAA AAAAAAAAAA AAAAAA 
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Figure 1 1 
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Figure 14 
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